I'm a PhD student at Cornell Statistics, minoring in Computer Science. I'm advised by Giles Hooker and Martin Wells, and Thorsten Joachims is on my committee. Previously, I studied at Berkeley and Columbia, and worked in startups and nonprofits in NYC, including the health department and public hospitals system. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research (now Naver Labs) and summer 2017 at Microsoft Research Redmond, working with Rich Caruana. My work is supported by a Harmony Institute Research Fellowship and an Engaged Cornell grant.
Broadly, I work on inference and interpretability of machine learning methods, in particular tree ensembles, and causal inference. I particularly enjoy working on methods useful for healthcare and public policy.
Black-box risk scores data sets: coming soon
Tan, R Caruana, G Hooker, Y Lou. Detecting Bias in Black-Box Models Using Transparent Model Distillation. Under review
Tan, G Hooker, M Wells. Peeking into the Random Forest: Interpretability in Tree and Observation Space. In submission. Preliminary version: Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable. NIPS Interpretability Workshop 2016. [blog mention]
Tan, S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark. Using Bayesian Evidence Synthesis to Estimate Disease Prevalence Among Hard-To-Reach Populations: Hepatitis C in New York City. Under review; presented to NYC Health Commissioner
Tan, G Hooker, M Wells. Probabilistic Matching: Incorporating Uncertainty to Correct for Selection Bias. NIPS Causal Inference Workshop 2016; journal version in submission
Tan, D Miller, J Savage. Proximity Score Matching: Using Random Forest Distance for Matching in Causal Inference. Student Paper Award, American Statistical Association SSPA section; NIPS Machine Learning in Healthcare Workshop 2015
I Vasi, E Walker, JS Johnson, Tan "No Fracking Way!" Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013. American Sociological Review 2015. 2 Best Paper Awards, American Sociological Association CITAMS and CBSM sections. [press release] [press release] [The Guardian] [The Atlantic] [Pacific Standard]
(Alphabetical) FB Darku, S He, MA Hossain, S Ren, Tan, I Trejo-Lorenzo. Positive Unlabeled Learning for Anomaly Detection in Nut Allergies Protein Microarray Data. IMSM 2016 project; CRSC Technical Report TR16-08. [press release] [blog post]
Tan, R Rotabi, HGT Nguyen. Using Ranking Support Vector Machines for Group Recommendations. NYAS Machine Learning Symposium 2015
Tan, R Low, S Ito, R Gregory, L Bielory, V Dunn. Two Ways of Modeling Hospital Readmissions: Mixed and Marginal Models. Proceedings of JSM 2013
Tan, R Low, S Ito, R Gregory, V Dunn. Drug Interactions of Beta Blockers and Beta Agonists and Their Association with Hospital Admissions. Proceedings of SAS GF 2013
R Low, S Ito, R Gregory, L Rassi, Tan, C Jacobs. Hospital Readmission Rates: Related To ED Volume, Population, And Economic Variables. Academic Emergency Medicine 2012