Hui Fen (Sarah) Tan
PhD student, Cornell Statistics

Contact Me
h t 3 9 5 AT cornell DOT edu
Github   LinkedIn

I am a 4th year PhD student at Cornell Statistics, minoring in Computer Science. Broadly, I work on inference and interpretability of machine learning methods, in particular tree ensembles, and causal inference. I am advised by Giles Hooker and Martin Wells, and Thorsten Joachims is on my committee.

Previously, I studied at Berkeley and Columbia, and worked in startups and nonprofits in NYC. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research in the machine learning group. I will be at Microsoft Research Redmond this summer. My work is supported by a Harmony Institute Research Fellowship and an Engaged Cornell grant.



R package surfin: (Statistical Inference for Random Forests)

Publications & Presentations

Machine Learning

HF Tan., G Hooker, M Wells. Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable. NIPS Interpretability Workshop 2016

HF Tan., G Hooker, M Wells. Probabilistic Matching: Incorporating Uncertainty to Correct for Selection Bias. NIPS Causal Inference Workshop 2016

HF Tan., D Miller, J Savage. Proximity Score Matching: Using Random Forest Distance for Matching in Causal Inference. Student Travel Award, American Statistical Association SSPA section; NIPS Machine Learning in Healthcare Workshop 2015; journal version in submission

(Alphabetical) FB Darku, S He, MA Hossain, S Ren, HF Tan, I Trejo-Lorenzo. Positive Unlabeled Learning for Anomaly Detection in Nut Allergies Protein Microarray Data. IMSM 2016 project; CRSC Technical Report TR16-08. [blog post]

HF Tan., R Rotabi, HGT Nguyen. Using Ranking Support Vector Machines for Group Recommendations. NYAS Machine Learning Symposium 2015

(Alphabetical) S Abraham, J Lockhart, HF Tan, R Turner, Y Kim. Identifying At-Risk Mothers for Targeted Interventions. KDD 2014 Session on Data Science for Social Good. [blog post] [presentation]

Natural Language Processing Applications

I Vasi, E Walker, JS Johnson, HF Tan. "No Fracking Way!" Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013. 2 Best Paper awards, American Sociological Association CITAMS and CBSM sections. American Sociological Review 2015. [press coverage] [press release] [press release]

Survey Sampling Applications

HF Tan., S Makela, J Stark, K Konty, S Balter, T Zheng. Using Bayesian Evidence Synthesis to Estimate Disease Prevalence Among Hard-To-Reach Populations: Hepatitis C in New York City. In revision; presented to NYC Health Commissioner

Statistical Methods for Healthcare

HF Tan., R Low, S Ito, R Gregory, L Bielory, V Dunn. Two Ways of Modeling Hospital Readmissions: Mixed and Marginal Models. Proceedings of JSM 2013

HF Tan, R Low, S Ito, R Gregory, V Dunn. Drug Interactions of Beta Blockers and Beta Agonists and Their Association with Hospital Admissions. Proceedings of SAS GF 2013

R Low, S Ito, R Gregory, L Rassi, HF Tan, C Jacobs. Hospital Readmission Rates: Related To ED Volume, Population, And Economic Variables. Academic Emergency Medicine 2012


Organizing Committee: ICHPS 2018, WiML 2016, Statistics for Social Good at JSM 2016
President, Cornell Statistics Graduate Society


(aka some things I did for fun)

Last Updated

January 2017