I am a 4th year PhD student at Cornell Statistics, minoring in Computer Science. Broadly, I work on inference and interpretability of machine learning methods, in particular tree ensembles, and causal inference. I am advised by Giles Hooker and Martin Wells, and Thorsten Joachims is on my committee.
Previously, I studied at Berkeley and Columbia, and worked in startups and nonprofits in NYC. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research in the machine learning group. I am at Microsoft Research Redmond this summer, working with Rich Caruana. My work is supported by a Harmony Institute Research Fellowship and an Engaged Cornell grant.
I co-organized the 2016 Women in Machine Learning workshop, co-located with NIPS.
HF Tan., G Hooker, M Wells. Peeking into the Random Forest: Interpretability in Tree and Observation Space. In submission. Related work: Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable. NIPS Interpretability Workshop 2016. [blog mention]
HF Tan., S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark. Using Bayesian Evidence Synthesis to Estimate Disease Prevalence Among Hard-To-Reach Populations: Hepatitis C in New York City. In revision; presented to NYC Health Commissioner
HF Tan., D Miller, J Savage. Proximity Score Matching: Using Random Forest Distance for Matching in Causal Inference. Student Travel Award, American Statistical Association SSPA section; NIPS Machine Learning in Healthcare Workshop 2015. Journal version in submission
HF Tan., G Hooker, M Wells. Probabilistic Matching: Incorporating Uncertainty to Correct for Selection Bias. NIPS Causal Inference Workshop 2016
I Vasi, E Walker, JS Johnson, HF Tan. "No Fracking Way!" Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013. 2 Best Paper awards, American Sociological Association CITAMS and CBSM sections. American Sociological Review 2015. [press release] [press release] [The Guardian] [The Atlantic] [Pacific Standard]
(Alphabetical) FB Darku, S He, MA Hossain, S Ren, HF Tan, I Trejo-Lorenzo. Positive Unlabeled Learning for Anomaly Detection in Nut Allergies Protein Microarray Data. IMSM 2016 project; CRSC Technical Report TR16-08. [press release] [blog post]
HF Tan., R Rotabi, HGT Nguyen. Using Ranking Support Vector Machines for Group Recommendations. NYAS Machine Learning Symposium 2015
HF Tan., R Low, S Ito, R Gregory, L Bielory, V Dunn. Two Ways of Modeling Hospital Readmissions: Mixed and Marginal Models. Proceedings of JSM 2013
HF Tan, R Low, S Ito, R Gregory, V Dunn. Drug Interactions of Beta Blockers and Beta Agonists and Their Association with Hospital Admissions. Proceedings of SAS GF 2013
R Low, S Ito, R Gregory, L Rassi, HF Tan, C Jacobs. Hospital Readmission Rates: Related To ED Volume, Population, And Economic Variables. Academic Emergency Medicine 2012