Hi, I'm

Jeff Brennan

data engineer

I am passionate about improving healthcare outcomes by building robust pipelines, proactive monitoring, and devtools that empower data practitioners. I have a background in public health and epidemiology, and found a love for using technology to create population-level insights that inform clinical decisions. I work remotely as a data engineer for Medisolv and am based in Brooklyn, NY.

In my spare time, I contribute to the mrpowers-io collection of devtools (1M+ monthly downloads) for working with big data and work on a dashboard to track the impact of our efforts.


Here are some of the technologies that I have been working with recently:
  • Python (advanced)
  • SQL (advanced)
  • R (advanced)
  • Scala (intermediate)
  • PySpark
  • Dagster
  • Docker
  • Plotly Dash
  • Dbt
  • Polars
  • Databricks Workflows
  • Pydantic


Data Engineer - Medisolv
Jul 2023 - present

Oversee the ingestion and transformation of patient data (10B+ records/week, 150TB+ data lake) for hundreds of hospital clients

  • Created a Python CLI wrapper around Databricks Asset Bundles to programmatically generate Databricks Workflows, enabling a 20x runtime improvement and $100k+ yearly savings through task-specific cluster tuning
  • Designed our central git repository to store Databricks jobs, tests, and utilities. Trained 12 analysts and engineers on software development best practices enabling faster iteration, regression testing, and improved process visibility for management.
  • Led the migration of 1,500+ Azure Data Factory (ADF) pipelines to a source-controlled IaC implementation. Developed tests to validate our pipelines before they were published to production, reducing our error rate by 34%
  • Developed a Spark metrics parser using Plotly Dash to guide data-driven transformation rewrites, saving thousands in wasted compute time and storage access costs.
Data Analyst - NewYork-Presbyterian
Dec 2020 - Jul 2023

Worked for NewYork Quality Care - the ACO of NewYork-Presbyterian, Weill Cornell, and Columbia

  • Managed the calculation, tracking, and reporting of quality metrics, leading to $20M+ in savings
  • Built weekly analytics ELT pipeline (100M+ encounter records for 35k Medicare patients)
  • Led the adoption of geographic analysis by designing custom address cleaning and geolocation workflow
  • Developed data cleaning helper functions in Python and R used by 9 other analysts
  • Reduced Tableau loading times from minutes to seconds by designing composable data models for our team
Data Engineer (contract) - UTHealth
May 2020 - Nov 2023

Built and solely maintained database powering the UTHealth COVID-19 dashboard

  • Created and maintained daily Texas COVID-19 data pipeline from state and third-party sources
  • Web scraped data with Python (REST apis, beautiful soup, selenium)
  • Developed a monitoring Slack bot and unit tests to ensure consistent data quality
Research Coordinator I - Baylor College of Medicine
June 2017 - Nov 2019
  • Authored thesis (100+ citations) on biomarkers of traumatic brain injury (TBI) and provided data support for other research efforts
  • Applied variable selection on hundreds of biomarker combinations to identify TBI predictors
  • Built analysis pipelines and created publication-ready data visualization in R


2018 - 2020
Master of Science in Epidemiology (Minor Biostatistics)
UTHealth Houston
GPA: 3.8
  • Studied Epidemiology, with a thesis on the biomarkers of Traumatic Brain Injury (TBI)
  • Coursework included biostatistics, applied machine learning, and study design
2014 - 2018
Bachelor of Science in Public Health
University of Texas at Austin
GPA: 3.7

Received a comprehensive public health education, convering global health, infectious disease microbiology, and biostatistics

Extracurricular Activities

  • President of SURGe - a student-led organization connecting STEM faculty with students needing research experience


Python Dagster dbt Plotly Dash DuckDB FastAPI CLI
Database and dashboard to track work in the mrpowers-io organization
Python Dagster dbt Plotly Dash DuckDB
small data cleaning project to create a personalized movie database for my friend
Python PySpark OSS devtools
PySpark helper library downloaded 600k+ times per month
R Python Selenium REST Tableau
Collection of epidemiology data powering the UTHealth COVID-19 dashboard