A series of discoveries at the intersection of data optimization and generalization has engulfed my non-work attention over the last few months, and I have emerged with a master's in AI.
I focused on a question that first sparked my curiosity during my PhD work: how well do population-level health patterns hold up when applied to individual or localized contexts?
To explore this, I developed a novel method to combine multiple large-scale public health datasets into synthetic versions that retain key population trends while expanding analytic flexibility and possibility. Over ten thousand lines of Python code prepared, transformed, and integrated NHANES, NHIS, and BRFSS data before synthesizing their aligned combinations into eight separate diagnostic cohorts. Subsequent variance retention analysis measured intra-variable distributions and inter-variable correlations together.
My plan for the coming months includes sharing some of the underlying code on this website, primarily for my own health informatics students but also for any other data science enthusiasts keen to explore innovation in this fascinating area of study.