Hui Fen (Sarah) Tan
PhD student, Cornell Statistics

Contact Me
h t 3 9 5 AT cornell DOT edu
Github   LinkedIn

I'm a PhD student at Cornell Statistics, minoring in Computer Science. I'm a visiting student at UCSF in the Bay Area in Spring 2018. Broadly, I work on causal inference and interpretability of machine learning methods, in particular tree ensembles.

I'm advised by Giles Hooker and Martin Wells, and Thorsten Joachims is on my committee. Previously, I studied at Berkeley and Columbia, and worked in public policy and nonprofits in NYC, including a media nonprofit, the health department and public hospitals system. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research (now Naver Labs) and summer 2017 at Microsoft Research, working with Rich Caruana.


Code & Data

Publications, Presentations, Preprints

Tan, R Caruana, G Hooker, A Gordo. Transparent Model Distillation. Working paper

Tan, R Caruana, G Hooker, Y Lou. Auditing Black-Box Models Using Transparent Model Distillation With Side Information. A short version appeared as an Oral at AIES 2018 and Spotlight at NIPS 2017 Interpretability Symposium. Media coverage: MIT Technology Review, Politico, Futurism. Code and Data

Tan, S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark. A Bayesian Evidence Synthesis Approach to Estimate Disease Prevalence in Hard-To-Reach Populations: Hepatitis C in New York City. Epidemics 2018. Presented to NYC Health Commissioner. Talk at NDRI. Code

Tan, G Hooker, M Wells. Probabilistic Matching: Incorporating Uncertainty to Improve Propensity Score Matching. In preparation. Preliminary version in NIPS 2016 Causal Inference Workshop

Tan, G Hooker, M Wells. Peeking into the Random Forest: Post-Hoc Interpretability Using Prototypes. In preparation. Preliminary version in NIPS 2016 Interpretability Workshop

Tan, D Miller, J Savage. Proximity Score Matching: A Locally-Adaptive Random Forest Metric for Matching in Causal Inference. Under review. Student Paper Award from the American Statistical Association's SSPA section. Lightning talk at Atlantic Causal Inference Conference 2015. NIPS 2015 Machine Learning in Healthcare Workshop

(Equal contribution) S Seto*, Tan*, G Hooker, M Wells. A Double Parametric Bootstrap Test for Topic Models. NIPS 2017 Interpretability Symposium

I Vasi, E Walker, JS Johnson, Tan "No Fracking Way!" Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013. American Sociological Review 2015. 2 Best Paper Awards from the American Sociological Association's CITAMS and CBSM sections. Press releases: University of Iowa, Harmony Institute. Media coverage: The Guardian, The Atlantic, Pacific Standard

Older Work

Machine Learning

(Alphabetical) FB Darku, S He, MA Hossain, S Ren, Tan, I Trejo-Lorenzo. Positive Unlabeled Learning for Anomaly Detection in Nut Allergies Protein Microarray Data. IMSM 2016 project; CRSC Technical Report TR16-08. Talk at SAMSI. Press releases: SAMSI, Rho, Inc

Tan, R Rotabi, HGT Nguyen. Using Ranking Support Vector Machines for Group Recommendations. NYAS Machine Learning Symposium 2015

(Alphabetical) S Abraham, J Lockhart, Tan, R Turner, Y Kim. Identifying At-Risk Mothers for Targeted Interventions. KDD 2014 Session on Data Science for Social Good. Talk at Chicago Python User Group. Blog post and presentation

Statistical Methods for Healthcare

Tan, R Low, S Ito, R Gregory, L Bielory, V Dunn. Two Ways of Modeling Hospital Readmissions: Mixed and Marginal Models. JSM 2013

Tan, R Low, S Ito, R Gregory, V Dunn. Drug Interactions of Beta Blockers and Beta Agonists and Their Association with Hospital Admissions. SAS GF 2013

R Low, S Ito, R Gregory, L Rassi, Tan, C Jacobs. Hospital Readmission Rates: Related To ED Volume, Population, And Economic Variables. Society for Academic Emergency Medicine 2012

Invited Talks


Awards & Grants

Fun Stuff