Hui Fen (Sarah) Tan
PhD student, Cornell Statistics

Contact Me
h t 3 9 5 AT cornell DOT edu
Google Scholar
Theme from orderedlist

I'm a PhD student at Cornell Statistics, minoring in Computer Science. I'm currently a visiting student at UCSF. Broadly, I work on interpretability of machine learning methods, in particular tree ensembles. I'm also interested in causal inference and algorithmic fairness, and am affiliated with Cornell's Algorithms, Big Data, and Inequality program.

I'm advised by Giles Hooker and Martin Wells, and Thorsten Joachims is on my committee. Previously, I studied at Berkeley and Columbia, and worked in public policy in NYC, including the health department, public hospitals system, and a media nonprofit. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research (now Naver Labs) and summers 2017 and 2018 at Microsoft Research, working with Rich Caruana. I'm on the board of the Women in Machine Learning organization.


Code & Data

Publications, Presentations, Preprints

Tan, R Caruana, G Hooker, Y Lou. Auditing Black-Box Models Using Transparent Model Distillation With Side Information. A short version appeared as an Oral at AAAI/ACM AIES 2018 and Spotlight at NIPS 2017 Interpretability Symposium. Media coverage: MIT Technology Review, Politico, Futurism. Code and Data

Tan, R Caruana, G Hooker, A Gordo. Transparent Model Distillation. Working paper

Tan, Interpretable Approaches to Detect Bias in Black-Box Models. AAAI/ACM AIES 2018 Doctoral Consortium

Tan, S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark. A Bayesian Evidence Synthesis Approach to Estimate Disease Prevalence in Hard-To-Reach Populations: Hepatitis C in New York City. Epidemics 2018. Presented to NYC Health Commissioner. Talk at NDRI. Code

Tan, G Hooker, M Wells. Probabilistic Matching: Incorporating Uncertainty to Improve Propensity Score Matching. In preparation. Preliminary version in NIPS 2016 Causal Inference Workshop

Tan, G Hooker, M Wells. Peeking into the Random Forest: Post-Hoc Interpretability Using Prototypes. In preparation. Preliminary version in NIPS 2016 Interpretability Workshop

Tan, D Miller, J Savage. Proximity Score Matching: A Locally-Adaptive Random Forest Metric for Matching in Causal Inference. Under review. Student Paper Award from the American Statistical Association's SSPA section. Lightning talk at Atlantic Causal Inference Conference 2015. NIPS 2015 Machine Learning in Healthcare Workshop

(Equal contribution) S Seto*, Tan*, G Hooker, M Wells. A Double Parametric Bootstrap Test for Topic Models. NIPS 2017 Interpretability Symposium

I Vasi, E Walker, JS Johnson, Tan "No Fracking Way!" Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013. American Sociological Review 2015. 2 Best Paper Awards from the American Sociological Association's CITAMS and CBSM sections. Press releases: University of Iowa, Harmony Institute. Media coverage: The Guardian, The Atlantic, Pacific Standard

Older Work

Machine Learning

(Alphabetical) FB Darku, S He, MA Hossain, S Ren, Tan, I Trejo-Lorenzo. Positive Unlabeled Learning for Anomaly Detection in Nut Allergies Protein Microarray Data. IMSM 2016 project; CRSC Technical Report TR16-08. Talk at SAMSI. Press releases: SAMSI, Rho, Inc

Tan, R Rotabi, HGT Nguyen. Using Ranking Support Vector Machines for Group Recommendations. NYAS Machine Learning Symposium 2015

(Alphabetical) S Abraham, J Lockhart, Tan, R Turner, Y Kim. Identifying At-Risk Mothers for Targeted Interventions. KDD 2014 Session on Data Science for Social Good. Talk at Chicago Python User Group. Blog post and presentation

Statistical Methods for Healthcare

Tan, R Low, S Ito, R Gregory, L Bielory, V Dunn. Two Ways of Modeling Hospital Readmissions: Mixed and Marginal Models. JSM 2013

Tan, R Low, S Ito, R Gregory, V Dunn. Drug Interactions of Beta Blockers and Beta Agonists and Their Association with Hospital Admissions. SAS GF 2013

R Low, S Ito, R Gregory, L Rassi, Tan, C Jacobs. Hospital Readmission Rates: Related To ED Volume, Population, And Economic Variables. Society for Academic Emergency Medicine 2012

Invited Talks


Awards & Grants

Fun Stuff