I'm a PhD student at Cornell Statistics, minoring in Computer Science. I'm currently a visiting student at UCSF. Broadly, I work on interpretability of machine learning methods. I'm also interested in causal inference, healthcare applications, and algorithmic fairness.
I'm advised by Giles Hooker and Martin Wells, and Thorsten Joachims and Rich Caruana are on my committee. Previously, I studied at Berkeley and Columbia, and worked in public policy in NYC, including the health department and public hospitals system. In 2014, I was a Data Science for Social Good fellow. I spent summer 2015 at Xerox Research (now Naver Labs) and summers 2017 and 2018 at Microsoft Research, working with Rich Caruana. I'm on the board of the Women in Machine Learning organization.
I’m based in the SF Bay Area in 2018. You can reach me at ht395 AT cornell DOT edu.
I am on the job market! Here is a short cv. I will be attending NeurIPS in Montreal. Please reach out if you would like to meet.
- 12/18: Co-organizing a workshop on Debugging ML models at ICLR 2019! Stay tuned for details.
- 11/18: Giving talk at AT&T Labs Graduate Student Symposium
- 6/18: Honored to receive Microsoft Research Dissertation Grant
- 5/18: Back at UC Santa Cruz again to give guest lecture on interpretability
- 4/18: Visiting Novartis Pharmaceuticals’s Statistics Methodology Group
- 3/18: Giving talks at UC Santa Cruz and UCSF
- 6/17: Grateful to receive American Statistical Association Wray Jackson Smith Award
- 3/17: My project evaluating the impact of later school start times in NYC public schools has received an Engaged Cornell grant. You can read more about it here and here
For older news, click here.
Code & Data
- R package surfin: (Statistical Inference for Random Forests)
- Data and code for distilling black-box risk scores paper
Publications & Preprints
- Learning Global Additive Explanations for Neural Nets Using Model Distillation
- Tan, R Caruana, G Hooker, P Koch, A Gordo
- Under review
- Investigating Human + Machine Complementarity for Recidivism Predictions
- Tan, J Adebayo, K Inkpen, E Kamar
- Under review
- Interpretability is Harder in the Multiclass Setting: Axiomatic Interpretability for Multiclass Additive Models
- X Zhang, Tan, P Koch, Y Lou, U Chajewska, R Caruana
- Under review
- Proximity Score Matching: Locally Adaptive Matching for Causal Inference
- Tan, D Miller, J Savage
- Full version in progress. Preliminary version in NIPS 2015 Machine Learning in Healthcare Workshop
- Lightning talk, Atlantic Causal Inference Conference 2015
- 1 of 3 Best Student Paper Awards from American Statistical Association’s SSPA section
- Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation
- Tan, R Caruana, G Hooker, Y Lou
- Oral at AAAI/ACM AIES 2018
- Also appeared as: Spotlight at NIPS 2017 Interpretability Symposium, Spotlight at NIPS 2017 Transparent Machine Learning in Safety Critical Environments Workshop
- Media coverage: MIT Technology Review, Politico, Futurism, WorkFlow
- Code and data
- A Bayesian Evidence Synthesis Approach to Estimate Disease Prevalence in Hard-To-Reach Populations: Hepatitis C in New York City
- Interpretable Approaches to Detect Bias in Black-Box Models
- AAAI/ACM AIES 2018 Doctoral Consortium
- A Double Parametric Bootstrap Test for Topic Models
- S Seto, Tan, G Hooker, M Wells
- NIPS 2017 Interpretability Symposium
- Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable
- Tan, G Hooker, M Wells
- NIPS 2016 Interpretability Workshop
- “No Fracking Way!” Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013
For older publications, click here.
- Co-organizer (together with Himabindu Lakkaraju, Julius Adebayo, Rich Caruana), ICLR 2019 Workshop “Debugging Machine Learning Models”
- Board member, Women in Machine Learning organization (WiML)
- Mentor, 2018 WiML Workshop mentoring roundtables
- Co-organizer (together with Michael Elliott and James O’Malley), Invited Session “New Advances in Causal Inference for Longitudinal and Survival Data” at International Conference on Health Policy Statistics (ICHPS) 2018
- Student representative, ICHPS 2018 Scientific Committee
- Co-organizer (together with Rayid Ghani and Hadley Wickham), Topic-Contributed Session “Statistics for Social Good” at JSM 2016
- Co-organizer (together with Diana Cai, Deborah Hanus, Isabel Valera, Rose Yu), 2016 WiML Workshop. WiML Workshop has grown tremendously, and the year I organized, it had 600 attendees and 200 posters. I am most proud of the mentoring roundtables format we introduced that year, with 50 roundtables on research and career topics bringing together our attendees and experts in close conversation
- Cornell internal: I was president of the Statistics Graduate Society and co-organized (together with Ashudeep Singh) the Cornell Machine Learning reading group