Hi, I'm

Jeff Brennan

data engineer

I am passionate about improving healthcare outcomes by building robust pipelines, proactive monitoring, and devtools that empower data practitioners. I have a background in public health and epidemiology, and found a love for using technology to create population-level insights that inform clinical decisions. I work remotely as a data engineer for Medisolv and am based in Brooklyn, NY.

In my spare time, I contribute to the mrpowers-io collection of devtools (1M+ monthly downloads) for working with big data and work on a dashboard to track the impact of our efforts.

Skills

Here are some of the technologies that I have been working with recently:
  • Python (advanced)
  • SQL (advanced)
  • R (advanced)
  • Scala (intermediate)
  • PySpark
  • Dagster
  • Docker
  • Plotly Dash
  • Dbt
  • Polars
  • Databricks Workflows
  • Pydantic

Experience

Data Engineer - Medisolv
Jul 2023 - present

Oversee the ingestion and transformation of patient data (10B+ records/week, 150TB+ data lake) for hundreds of hospital clients

  • Architect of core streaming pipeline that powers real-time reporting on key business metrics, performance trend alerts, and automated responses to common errors
  • Created a CLI to generate Databricks Workflows - enabling our team to programmatically tailor cluster configurations for clients with thousands to millions of patients
  • Led the migration of 1,500+ Azure Data Factory (ADF) pipelines to a source-controlled repository. Implemented CI/CD in Azure DevOps and automated pipeline validation using Pydantic and Pytest
  • Developed a Spark metrics parser using Plotly Dash to guide data-driven transformation rewrites
Data Analyst - NewYork-Presbyterian
Dec 2020 - Jul 2023

Worked for NewYork Quality Care - the ACO of NewYork-Presbyterian, Weill Cornell, and Columbia

  • Managed the calculation, tracking, and reporting of quality metrics, leading to $20M+ in savings
  • Built weekly analytics ELT pipeline (100M+ encounter records for 35k Medicare patients)
  • Led the adoption of geographic analysis by designing custom address cleaning and geolocation workflow
  • Developed data cleaning helper functions in Python and R used by 9 other analysts
  • Reduced Tableau loading times from minutes to seconds by designing composable data models for our team
Data Engineer (contract) - UTHealth
May 2020 - Nov 2023

Built and solely maintained database powering the UTHealth COVID-19 dashboard

  • Created and maintained daily Texas COVID-19 data pipeline from state and third-party sources
  • Web scraped data with Python (REST apis, beautiful soup, selenium)
  • Developed a monitoring Slack bot and unit tests to ensure consistent data quality
Research Coordinator I - Baylor College of Medicine
June 2017 - Nov 2019
  • Authored thesis (100+ citations) on biomarkers of traumatic brain injury (TBI) and provided data support for other research efforts
  • Applied variable selection on hundreds of biomarker combinations to identify TBI predictors
  • Built analysis pipelines and created publication-ready data visualization in R

Education

2018 - 2020
Master of Science in Epidemiology (Minor Biostatistics)
UTHealth Houston
GPA: 3.8
  • Studied Epidemiology, with a thesis on the biomarkers of Traumatic Brain Injury (TBI)
  • Coursework included biostatistics, applied machine learning, and study design
2014 - 2018
Bachelor of Science in Public Health
University of Texas at Austin
GPA: 3.7

Received a comprehensive public health education, convering global health, infectious disease microbiology, and biostatistics

Extracurricular Activities

  • President of SURGe - a student-led organization connecting STEM faculty with students needing research experience

Projects

Ampere
Python Dagster dbt Plotly Dash DuckDB FastAPI CLI
Ampere
Database and dashboard to track work in the mrpowers-io organization
jpmdb
Python Dagster dbt Plotly Dash DuckDB
jpmdb
small data cleaning project to create a personalized movie database for my friend
Quinn
Python PySpark OSS devtools
Quinn
PySpark helper library downloaded 600k+ times per month
TexasPandemics
R Python Selenium REST Tableau
TexasPandemics
Collection of epidemiology data powering the UTHealth COVID-19 dashboard