Data Engineer
-
Medisolv
Jul 2023 - present
Oversee the ingestion and transformation of patient data (10B+ records/week, 150TB+ data lake) for hundreds of hospital clients
- Architect of core streaming pipeline that powers real-time reporting on key business metrics, performance trend alerts, and automated responses to common errors
- Created a CLI to generate Databricks Workflows - enabling our team to programmatically tailor cluster configurations for clients with thousands to millions of patients
- Led the migration of 1,500+ Azure Data Factory (ADF) pipelines to a source-controlled repository. Implemented CI/CD in Azure DevOps and automated pipeline validation using Pydantic and Pytest
- Developed a Spark metrics parser using Plotly Dash to guide data-driven transformation rewrites
Worked for NewYork Quality Care - the ACO of NewYork-Presbyterian, Weill Cornell, and Columbia
- Managed the calculation, tracking, and reporting of quality metrics, leading to $20M+ in savings
- Built weekly analytics ELT pipeline (100M+ encounter records for 35k Medicare patients)
- Led the adoption of geographic analysis by designing custom address cleaning and geolocation workflow
- Developed data cleaning helper functions in Python and R used by 9 other analysts
- Reduced Tableau loading times from minutes to seconds by designing composable data models for our team
Data Engineer (contract)
-
UTHealth
May 2020 - Nov 2023
Built and solely maintained database powering the UTHealth COVID-19 dashboard
- Created and maintained daily Texas COVID-19 data pipeline from state and third-party sources
- Web scraped data with Python (REST apis, beautiful soup, selenium)
- Developed a monitoring Slack bot and unit tests to ensure consistent data quality
- Authored thesis (100+ citations) on biomarkers of traumatic brain injury (TBI) and provided data support for other research efforts
- Applied variable selection on hundreds of biomarker combinations to identify TBI predictors
- Built analysis pipelines and created publication-ready data visualization in R