I recently defended my dissertation and started as a research scientist at Facebook. I received my PhD from Cornell Statistics, where I was advised by Giles Hooker and Martin Wells, with Thorsten Joachims and Rich Caruana on my committee. During the later years of my PhD studies, I was based in the SF Bay Area where I was a visiting student and bioinformatics programmer at UCSF.
Broadly, I work on interpretability of machine learning methods. I'm also interested in algorithmic fairness and causal inference. I particularly enjoy working on methods useful for healthcare and public policy.
Previously, I studied at Berkeley and Columbia, and worked in public policy in NYC, including the health department and public hospitals system. In 2014, I was a Data Science for Social Good fellow. I spent two summers at Microsoft Research, working with Rich Caruana, Kori Inkpen, and Ece Kamar. I'm on the board of the Women in Machine Learning organization.
I’m currently based in the SF Bay Area. You can reach me at ht395 AT cornell DOT edu.
- 10/19: I gave a talk at Data & Society’s Meeting on Fair ML in Health about risk scoring models in healthcare.
- 7/19: I had a blast helping out with UCSF’s AI4ALL program! I presented on dataset bias and helped mentor a project team using electronic medical records to predict opioid overdose and other conditions.
- 4/19: I will be giving a talk at Columbia’s Data Science Institute in the Data for Good seminar series.
- 3/19: I’m excited to help teach the new ML for biomedicine course at UCSF. This is perhaps the first official ML course at UCSF and I’m looking forward to teaching again!
- 12/18: I’m co-organizing a workshop at ICLR 2019 on Debugging ML models. Submit your paper or demo!
- 6/18: Honored to receive a Microsoft Research Dissertation Grant
- 6/17: Grateful to receive the American Statistical Association Wray Jackson Smith Award
- 3/17: My project evaluating the impact of later school start times in NYC public schools has received an Engaged Cornell grant. You can read more about it here and here
For older news, click here.
Code & Data
- R package surfin: (Statistical Inference for Random Forests)
- Data and code for distilling black-box risk scores paper
Publications and Preprints
- Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable
- Tan, M Soloviev, G Hooker, M Wells
- Under review
- Preliminary version in NIPS 2016 Interpretability Workshop
- Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
- B Lengerich, Tan, CH Chang, G Hooker, R Caruana
- Under review
- Learning Global Additive Explanations for Neural Nets Using Model Distillation
- Tan, R Caruana, G Hooker, P Koch, A Gordo
- Under review
- Preliminary version in NeurIPS 2018 Machine Learning for Health Workshop
- Investigating Human + Machine Complementarity: A Case Study on Recidivism
- Tan, J Adebayo, K Inkpen, E Kamar
- Under review
- Preliminary version in NeurIPS 2018 Workshop on Ethical, Social and Governance Issues in AI (Spotlight)
- Proximity Score Matching: Locally Adaptive Matching for Causal Inference
- Tan, D Miller, J Savage
- Full version in progress. Preliminary version in NIPS 2015 Machine Learning in Healthcare Workshop
- Lightning talk, Atlantic Causal Inference Conference 2015
- 1 of 3 Best Student Paper Awards from American Statistical Association’s SSPA section
Journal and Conference Papers
- Axiomatic Interpretability for Multiclass Additive Models
- X Zhang, Tan, P Koch, Y Lou, U Chajewska, R Caruana
- KDD 2019 (Oral)
- Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation
- A Bayesian Evidence Synthesis Approach to Estimate Disease Prevalence in Hard-To-Reach Populations: Hepatitis C in New York City
- “No Fracking Way!” Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013
Posters and Workshop Papers
- “Why Should You Trust My Explanation?” Understanding Uncertainty in LIME Explanations
- Y Zhang, K Song, Y Sun, Tan, M Udell
- ICML 2019 AI for Social Good Workshop
- Teaching biomedical applications of computer vision using docker containers
- DS Lituiev, Tan, A Bishara, J H Sohn, J Kornak, D Hadley
- UC Conference on AI in Biomedicine 2019
- A Double Parametric Bootstrap Test for Topic Models
- S Seto, Tan, G Hooker, M Wells
- NIPS 2017 Interpretability Symposium
- Probabilistic Matching: Incorporating Uncertainty to Improve Propensity Score Matching
- Tan, G Hooker, M Wells
- NIPS 2016 Causal Inference Workshop
For older publications and posters, click here.
- Co-organizer (together with Himabindu Lakkaraju, Julius Adebayo, Jacob Steinhardt, D. Sculley, Rich Caruana), ICLR 2019 Workshop “Debugging Machine Learning Models”
- Program committee:
- Vice president and executive board member, Women in Machine Learning organization (WiML)
- Mentor, 2019 UCSF’s AI4ALL program
- Mentor, 2018 WiML Workshop mentoring roundtables
- Co-organizer (together with Michael Elliott and James O’Malley), Invited Session “New Advances in Causal Inference for Longitudinal and Survival Data” at International Conference on Health Policy Statistics (ICHPS) 2018
- Student representative, ICHPS 2018 Scientific Committee
- Co-organizer (together with Rayid Ghani and Hadley Wickham), Topic-Contributed Session “Statistics for Social Good” at JSM 2016
- Co-organizer (together with Diana Cai, Deborah Hanus, Isabel Valera, Rose Yu), 2016 WiML Workshop. WiML Workshop has grown tremendously, and the year I organized, it had 600 attendees and 200 posters. I am most proud of the mentoring roundtables format we expanded that year, with 50 roundtables on research and career topics bringing together our attendees and experts in close conversation
- Cornell internal: I was president of the Statistics Graduate Society and co-organized (together with Ashudeep Singh) the Cornell Machine Learning reading group
- I played piano and (bad) ukulele in an Indian fusion carnatic band. We have some videos here