Hui Fen (Sarah) Tan
PhD student, Cornell Statistics

Contact Me
h t 3 9 5 AT cornell DOT edu
Github   LinkedIn

I'm a PhD student at Cornell Statistics, minoring in Computer Science. I'm visiting UCSF in Spring 2018. Broadly, I work on inference and interpretability of machine learning methods, in particular tree ensembles, and causal inference.

I'm advised by Giles Hooker and Martin Wells, and Thorsten Joachims and Rich Caruana are on my committee. Previously, I studied at Berkeley and Columbia, and worked in startups and nonprofits in NYC, including the health department and public hospitals system. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research (now Naver Labs) and summer 2017 at Microsoft Research, working with Rich Caruana.


Code & Data

Publications & Presentations

Tan, R Caruana, G Hooker, Y Lou. Detecting Bias in Black-Box Models Using Transparent Model Distillation. Oral at AIES 2018. Spotlight at NIPS Interpretability Symposium 2017.[MIT Technology Review]

Tan, S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark. A Bayesian Evidence Synthesis Approach to Estimate Disease Prevalence in Hard-To-Reach Populations: Hepatitis C in New York City. Under review. Presented to NYC Health Commissioner. Talk at National Development and Research Institutes

Tan, G Hooker, M Wells. Probabilistic Matching: Incorporating Uncertainty to Correct for Selection Bias. In preparation. Preliminary version in NIPS 2016 Causal Inference Workshop

Tan, G Hooker, M Wells. Peeking into the Random Forest: Interpretability in Tree and Observation Space. In preparation. Preliminary version Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable in NIPS 2016 Interpretability Workshop

Tan, D Miller, J Savage. Proximity Score Matching: A Locally-Adaptive Random Forest Metric for Matching in Causal Inference. Under review. Student Paper Award, American Statistical Association SSPA section. Lightning talk, Atlantic Causal Inference Conference 2015. NIPS 2015 Machine Learning in Healthcare Workshop

(Equal contribution) S Seto*, Tan*, G Hooker, M Wells. A Double Parametric Bootstrap Test for Topic Models. NIPS 2017 Interpretability Symposium

I Vasi, E Walker, JS Johnson, Tan "No Fracking Way!" Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013. American Sociological Review 2015. 2 Best Paper Awards, American Sociological Association CITAMS and CBSM sections. [press release] [press release] [The Guardian] [The Atlantic] [Pacific Standard]

Older Work

Machine Learning

(Alphabetical) FB Darku, S He, MA Hossain, S Ren, Tan, I Trejo-Lorenzo. Positive Unlabeled Learning for Anomaly Detection in Nut Allergies Protein Microarray Data. IMSM 2016 project; CRSC Technical Report TR16-08. Talk at SAMSI. [press release] [blog post]

Tan, R Rotabi, HGT Nguyen. Using Ranking Support Vector Machines for Group Recommendations. NYAS Machine Learning Symposium 2015

(Alphabetical) S Abraham, J Lockhart, Tan, R Turner, Y Kim. Identifying At-Risk Mothers for Targeted Interventions. KDD 2014 Session on Data Science for Social Good. Talk at Chicago Python User Group. [blog post] [presentation]

Statistical Methods for Healthcare

Tan, R Low, S Ito, R Gregory, L Bielory, V Dunn. Two Ways of Modeling Hospital Readmissions: Mixed and Marginal Models. JSM 2013

Tan, R Low, S Ito, R Gregory, V Dunn. Drug Interactions of Beta Blockers and Beta Agonists and Their Association with Hospital Admissions. SAS GF 2013

R Low, S Ito, R Gregory, L Rassi, Tan, C Jacobs. Hospital Readmission Rates: Related To ED Volume, Population, And Economic Variables. Society for Academic Emergency Medicine 2012


Awards & Grants

Fun Stuff