I'm a PhD student at Cornell Statistics, minoring in Computer Science. I'm advised by Giles Hooker and Martin Wells, and Thorsten Joachims is on my committee. Previously, I studied at Berkeley and Columbia, and worked in startups and nonprofits in NYC, including the health department and public hospitals system. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research (now Naver Labs) and summer 2017 at Microsoft Research Redmond, working with Rich Caruana. My work is supported by a Harmony Institute Research Fellowship and an Engaged Cornell grant.
Broadly, I work on inference and interpretability of machine learning methods, in particular tree ensembles, and causal inference. I particularly enjoy working on methods useful for healthcare and public policy.
Black-box risk scores data sets: coming soon
Tan, R Caruana, G Hooker, Y Lou. Detecting Bias in Black-Box Models Using Transparent Model Distillation. Under review. Short version accepted to NIPS Interpretability Symposium 2017. [MIT Technology Review]
Tan, S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark. Using Bayesian Evidence Synthesis to Estimate Disease Prevalence Among Hard-To-Reach Populations: Hepatitis C in New York City. Under review. Presented to NYC Health Commissioner
Tan, G Hooker, M Wells. Probabilistic Matching: Incorporating Uncertainty to Correct for Selection Bias. Under review. Preliminary version in NIPS Causal Inference Workshop 2016
Tan, G Hooker, M Wells. Peeking into the Random Forest: Interpretability in Tree and Observation Space. In submission. Preliminary version: Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable. NIPS Interpretability Workshop 2016. [blog mention]
Tan, D Miller, J Savage. Proximity Score Matching: Using Random Forest Distance for Matching in Causal Inference. Student Paper Award, American Statistical Association SSPA section. Lightning talk at Atlantic Causal Inference Conference 2015. NIPS Machine Learning in Healthcare Workshop 2015
(Equal contribution) S Seto*, Tan*, G Hooker, M Wells. A Double Parametric Bootstrap Test for Topic Models. Accepted to NIPS Interpretability Symposium 2017
I Vasi, E Walker, JS Johnson, Tan "No Fracking Way!" Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013. American Sociological Review 2015. 2 Best Paper Awards, American Sociological Association CITAMS and CBSM sections. [press release] [press release] [The Guardian] [The Atlantic] [Pacific Standard]
(Alphabetical) FB Darku, S He, MA Hossain, S Ren, Tan, I Trejo-Lorenzo. Positive Unlabeled Learning for Anomaly Detection in Nut Allergies Protein Microarray Data. IMSM 2016 project; CRSC Technical Report TR16-08. [press release] [blog post]
Tan, R Rotabi, HGT Nguyen. Using Ranking Support Vector Machines for Group Recommendations. NYAS Machine Learning Symposium 2015
Tan, R Low, S Ito, R Gregory, L Bielory, V Dunn. Two Ways of Modeling Hospital Readmissions: Mixed and Marginal Models. Proceedings of JSM 2013
Tan, R Low, S Ito, R Gregory, V Dunn. Drug Interactions of Beta Blockers and Beta Agonists and Their Association with Hospital Admissions. Proceedings of SAS GF 2013
R Low, S Ito, R Gregory, L Rassi, Tan, C Jacobs. Hospital Readmission Rates: Related To ED Volume, Population, And Economic Variables. Academic Emergency Medicine 2012