Jeff Brennan

data engineer

I am passionate about improving healthcare outcomes by building robust pipelines, proactive monitoring, and devtools that empower data practitioners. I have a background in public health and manage large data platforms that create population-level insights to inform clinical decisions.

In my spare time, I enjoy contributing to open source projects, currently focused on improving Python developer experience at Zed.

Skills

  • Python (advanced)
  • SQL (advanced)
  • R (advanced)
  • Scala (intermediate)
  • PySpark
  • Dagster
  • Docker
  • Plotly Dash
  • Dbt
  • Polars
  • Databricks Workflows
  • Pydantic

Experience

Data Engineer III at Astrana Health
Jul 2025 - Present
Building a unified data platform to streamline the ingestion and analytics of billions of clinical records
Data Engineer at Medisolv
Jul 2023 - Jul 2025

Oversaw the ingestion and transformation of patient data (10B+ records/week, 150TB+ data lake) for hundreds of hospital clients

  • Architect of core streaming pipeline that powers real-time reporting on key business metrics, performance trend alerts, and automated responses to common errors
  • Created a CLI to generate Databricks Workflows - enabling our team to programmatically tailor cluster configurations for clients with thousands to millions of patients
  • Led the migration of 1,500+ Azure Data Factory (ADF) pipelines to a source-controlled repository. Implemented CI/CD in Azure DevOps and automated pipeline validation using Pydantic and Pytest
  • Developed a Spark metrics parser using Plotly Dash to guide data-driven transformation rewrites
Data Analyst at NewYork-Presbyterian
Dec 2020 - Jul 2023

Worked for NewYork Quality Care - the ACO of NewYork-Presbyterian, Weill Cornell, and Columbia

  • Managed the calculation, tracking, and reporting of quality metrics, leading to $20M+ in savings
  • Built weekly analytics ELT pipeline (100M+ encounter records for 35k Medicare patients)
  • Led the adoption of geographic analysis by designing custom address cleaning and geolocation workflow
  • Developed data cleaning helper functions in Python and R used by 9 other analysts
  • Reduced Tableau loading times from minutes to seconds by designing composable data models for our team
Data Engineer (contract) at UTHealth
May 2020 - Nov 2023

Built and solely maintained database powering the UTHealth COVID-19 dashboard

  • Created and maintained daily Texas COVID-19 data pipeline from state and third-party sources
  • Web scraped data with Python (REST apis, beautiful soup, selenium)
  • Developed a monitoring Slack bot and unit tests to ensure consistent data quality
Research Coordinator I at Baylor College of Medicine
June 2017 - Nov 2019
  • Authored thesis (100+ citations) on biomarkers of traumatic brain injury (TBI) and provided data support for other research efforts
  • Applied variable selection on hundreds of biomarker combinations to identify TBI predictors
  • Built analysis pipelines and created publication-ready data visualization in R

Education

2018 - 2020
Master of Science in Epidemiology
UTHealth Houston
GPA: 3.8
  • Studied Epidemiology, with a thesis on the biomarkers of Traumatic Brain Injury (TBI)
  • Coursework included biostatistics, applied machine learning, and study design
2014 - 2018
Bachelor of Science in Public Health
University of Texas at Austin
GPA: 3.7

Received a comprehensive public health education spanning biostatistics, infectious disease microbiology, and global healt

Extracurricular Activities

  • President of SURGe - a student-led organization connecting STEM faculty with students needing research experience

Projects

Database and dashboard to track work in the mrpowers-io organization
small data cleaning project to create a personalized movie database for my friend
PySpark helper library downloaded 600k+ times per month
Collection of epidemiology data powering the UTHealth COVID-19 dashboard