Hui Fen (Sarah) Tan
PhD student, Cornell Statistics

Contact Me
h t 3 9 5 AT cornell DOT edu
Github   LinkedIn

I am a PhD student at Cornell Statistics, minoring in Computer Science. Broadly, I work on inference and interpretability of machine learning methods, in particular tree ensembles, and causal inference. I am advised by Giles Hooker and Martin Wells, and Thorsten Joachims is on my committee.

Previously, I studied at Berkeley and Columbia, and worked in startups and nonprofits in NYC, including the health department and public hospitals system. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research (now Naver Labs) in the machine learning group. I am at Microsoft Research Redmond this summer, working with Rich Caruana. My work is supported by a Harmony Institute Research Fellowship and an Engaged Cornell grant.

I co-organized the 2016 Women in Machine Learning workshop, co-located with NIPS.



R package surfin: (Statistical Inference for Random Forests)

Publications & Presentations

Machine Learning

HF Tan., G Hooker, M Wells. Peeking into the Random Forest: Interpretability in Tree and Observation Space. In submission. Preliminary version: Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable. NIPS Interpretability Workshop 2016. [blog mention]

HF Tan., S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark. Using Bayesian Evidence Synthesis to Estimate Disease Prevalence Among Hard-To-Reach Populations: Hepatitis C in New York City. Under review; presented to NYC Health Commissioner

HF Tan., D Miller, J Savage. Proximity Score Matching: Using Random Forest Distance for Matching in Causal Inference. Student Paper Award, American Statistical Association SSPA section; NIPS Machine Learning in Healthcare Workshop 2015

HF Tan., G Hooker, M Wells. Probabilistic Matching: Incorporating Uncertainty to Correct for Selection Bias. NIPS Causal Inference Workshop 2016; journal version in submission

Natural Language Processing Applications

I Vasi, E Walker, JS Johnson, HF Tan. "No Fracking Way!" Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013. American Sociological Review 2015. 2 Best Paper Awards, American Sociological Association CITAMS and CBSM sections. [press release] [press release] [The Guardian] [The Atlantic] [Pacific Standard]

Older Work

Machine Learning

(Alphabetical) FB Darku, S He, MA Hossain, S Ren, HF Tan, I Trejo-Lorenzo. Positive Unlabeled Learning for Anomaly Detection in Nut Allergies Protein Microarray Data. IMSM 2016 project; CRSC Technical Report TR16-08. [press release] [blog post]

HF Tan., R Rotabi, HGT Nguyen. Using Ranking Support Vector Machines for Group Recommendations. NYAS Machine Learning Symposium 2015

(Alphabetical) S Abraham, J Lockhart, HF Tan, R Turner, Y Kim. Identifying At-Risk Mothers for Targeted Interventions. KDD 2014 Session on Data Science for Social Good. [blog post] [presentation]

Statistical Methods for Healthcare

HF Tan., R Low, S Ito, R Gregory, L Bielory, V Dunn. Two Ways of Modeling Hospital Readmissions: Mixed and Marginal Models. Proceedings of JSM 2013

HF Tan, R Low, S Ito, R Gregory, V Dunn. Drug Interactions of Beta Blockers and Beta Agonists and Their Association with Hospital Admissions. Proceedings of SAS GF 2013

R Low, S Ito, R Gregory, L Rassi, HF Tan, C Jacobs. Hospital Readmission Rates: Related To ED Volume, Population, And Economic Variables. Academic Emergency Medicine 2012


Organizing Committee: ICHPS 2018, WiML 2016, Statistics for Social Good at JSM 2016


(aka some things I did for fun)

Last Updated

Aug 2017