Sarah Tan

Hui Fen (Sarah) Tan
Google Scholar
LinkedIn
Github

I am a researcher interested in algorithmic fairness, causal inference, interpretability, and healthcare. Currently, I am a Director in Responsible AI at Salesforce. I also hold a Visiting Scientist appointment at Cornell University. I co-founded the Trustworthy ML Initiative and am president of Women in Machine Learning (WiML).

I received my PhD in Statistics from Cornell University, where I was advised by Giles Hooker and Martin Wells, with Thorsten Joachims and Rich Caruana on my committee.

Previously, I studied at Berkeley and Columbia, and worked in public policy in NYC, including the health department and public hospitals system. I was also a Data Science for Social Good fellow. I was fortunate to spend summers at Microsoft Research, working with Rich Caruana, Kori Inkpen, and Ece Kamar. Towards the end of my PhD studies, I was a visiting student and bioinformatics programmer at UCSF medical school. I joined Facebook after completing my PhD, and worked in Core Data Science before moving to Responsible AI.

Contact

I’m currently based in Seattle. You can reach me at ht395 AT cornell.edu.

News

3/24: I gave a guest lecture on Ethics in Computer Vision at Stanford University’s CS131 “Computer Vision: Foundations and Applications” class.
9/23: I was a panelist at Columbia Business School’s Challenges in Operationalizing Responsible AI workshop.
7/23: I will be co-organizing a workshop at NeurIPS 2023 on Regulatable ML. Submit your paper!
1/23: I have been elected president of the Women in Machine Learning organization (WiML).
1/23: I will be the Tutorial Chair for FAccT 2023.
9/21: I will be the Diversity & Inclusion Chair for AISTATS 2022.
10/19: I gave a talk at Data & Society’s Meeting on Fair ML in Health about risk scoring models in healthcare.
7/19: I had a blast helping out with UCSF’s AI4ALL program! I presented on dataset bias and helped mentor a project team using electronic medical records to predict opioid overdose and other conditions.
3/19: I’m excited to help teach the new ML for biomedicine course at UCSF. This is perhaps the first official ML course at UCSF and I’m looking forward to teaching again!
6/18: Honored to receive a Microsoft Research Dissertation Grant.
6/17: Grateful to receive the American Statistical Association Wray Jackson Smith Award

For other news, click here.

Code & Data

Code for gradient boosted trees distance proposed in tree space prototypes paper
R package surfin: (Statistical Inference for Random Forests)
Data and code for distilling black-box risk scores paper

Publications and Preprints

Journal and Conference Papers

Error Discovery By Clustering Influence Embeddings
- F Wang, J Adebayo, Tan, D Garcia-Olano, N Kokhlikyan
- NeurIPS 2023
- Also appeared in: ICLR 2023 Pitfalls of limited data and computation for Trustworthy ML Workshop (Oral)
Missing Values and Imputation in Healthcare Data: Can Interpretable Machine Learning Help?
- Z Chen, Tan, U Chajewska, C Rudin, R Caruana
- CHIL 2023
Considerations When Learning Additive Explanations for Black-Box Models
- Tan, G Hooker, P Koch, A Gordo, R Caruana
- Machine Learning 2023
- Also appeared in: NeurIPS 2018 Machine Learning for Health Workshop
Interpretable Personalized Experimentation
- H Wu, Tan, W Li, M Garrard, A Obeng, D Dimmery, S Singh, H Wang, D Jiang, E Bakshy
- KDD 2022
- Also appeared in: Conference on Digital Experimentation 2021 (Oral)
How Interpretable and Trustworthy are GAMs?
- CH Chang, Tan, B Lengerich, A Goldenberg, R Caruana
- KDD 2021
Do I Look Like a Criminal? Examining the Impact of Racial Information on Human Judgement
- K Mallari, K Inkpen, P Johns, Tan, D Ramesh, E Kamar
- CHI 2020
Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable
- Tan, M Soloviev, G Hooker, M Wells
- ACM-IMS FODS 2020
- Also appeared in: NIPS 2016 Interpretability Workshop
- Code
Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
- B Lengerich, Tan, CH Chang, G Hooker, R Caruana
- AISTATS 2020
Axiomatic Interpretability for Multiclass Additive Models
- X Zhang, Tan, P Koch, Y Lou, U Chajewska, R Caruana
- KDD 2019 (Oral)
- Video
Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation
- Tan, R Caruana, G Hooker, Y Lou
- AIES 2018 (Oral)
- Also appeared in: NIPS 2017 Interpretability Symposium (Spotlight), NIPS 2017 Transparent Machine Learning in Safety Critical Environments Workshop (Spotlight)
- Media coverage: MIT Technology Review, Politico, Futurism, WorkFlow
- Code and data
A Bayesian Evidence Synthesis Approach to Estimate Disease Prevalence in Hard-To-Reach Populations: Hepatitis C in New York City
- Tan, S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark
- Epidemics 2018
- Presented to NYC Health Commissioner. Talk at NDRI
- Code
“No Fracking Way!” Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013
- I Vasi, E Walker, JS Johnson, Tan
- American Sociological Review 2015
- 2 Best Paper Awards from American Sociological Association’s CITAMS and CBSM sections
- Media coverage: The Guardian, The Atlantic, Pacific Standard
- Press releases: University of Iowa, Harmony Institute

Preprints

Efficient Heterogeneous Treatment Effect Estimation With Multiple Experiments and Multiple Outcomes
- L Yao, C Lo, I Nir, Tan, A Evnine, A Lerer, A Peysakhovich
- Under review
- Preliminary version in Conference on Digital Experimentation 2021 (Oral)
Using Explainable Boosting Machines (EBMs) to Detect Common Flaws in Data
- Z Chen, Tan, H Nori, K Inkpen, Y Lou, R Caruana
- Preliminary version in ECML-PKDD International Workshop and Tutorial on eXplainable Knowledge Discovery in Data Mining 2021 (Oral)
Practical Policy Optimization with Personalized Experimentation
- M Garrard, H Wang, B Letham, S Singh, A Kazerouni, Tan, Z Wang, M Huang, Y Hu, C Zhou, N Zhou, E Bakshy
- Preliminary version in NeurIPS 2021 Causal Inference Challenges in Sequential Decision Making Workshop
Investigating Human + Machine Complementarity: A Case Study on Recidivism
- Tan, J Adebayo, K Inkpen, E Kamar
- Under review
- Preliminary version in NeurIPS 2018 Workshop on Ethical, Social and Governance Issues in AI (Spotlight)
“Why Should You Trust My Explanation?” Understanding Uncertainty in LIME Explanations
- Y Zhang, K Song, Y Sun, Tan, M Udell
- Preliminary version in ICML 2019 AI for Social Good Workshop
Proximity Score Matching: Locally Adaptive Matching for Causal Inference
- Tan, D Miller, J Savage
- Preliminary version in NIPS 2015 Machine Learning in Healthcare Workshop
- Lightning talk, Atlantic Causal Inference Conference 2015
- 1 of 3 Best Student Paper Awards from American Statistical Association’s SSPA section

For older publications and posters, click here.

Service

Women in Machine Learning organization (WiML) President (2023 - Present), Vice President (2019 - 2020), Director (2018 - 2019, 2020 - 2023)
Area chair: FAccT, CHIL, Machine Learning for Health Symposium, Algorithmic Fairness through the Lens of Causality and Privacy, Algorithms Towards Ethical and Privacy Challenges in Social Media Recommendation System
Reviewer:
- Conferences: NeurIPS, ICML, ICLR, AISTATS, FAccT, AIES, KDD, AAAI, WWW, HCOMP
- Journals: TMLR, JAIR, Nature, Machine Learning, TPAMI, TIST, Journal of Biomedical and Health Informatics
- Workshops: Fair ML for Health, Human-Centric Machine Learning, Machine Learning for Health, Human In the Loop Learning, Safe ML, Computer Vision for Agriculture, Algorithmic Fairness through the Lens of Causality and Interpretability, AI for Public Health Workshop, Human in the Loop Learning, Algorithmic Fairness through the Lens of Causality and Robustness,
- Programs: Data Science for Social Good
Co-organizer:
- NeurIPS 2023 Workshop “Regulatable ML” (together with Himabindu Lakkaraju, Jiaqi Ma, Chirag Agarwal)
- Tutorial Chair, FAccT 2023 (together with Sina Fazelpour, Angela Zhou)
- Diversity & Inclusion Chair, AISTATS 2022 (together with Pablo Samuel Castro)
- Trustworthy ML Initiative (together with Himabindu Lakkaraju, Sara Hooker, Subhabrata Majumdar, Chhavi Yadav, Chirag Agarwal, Jaydeep Borkar, Marta Lemanczyk, Haohan Wang)
- ICLR 2019 Workshop “Debugging Machine Learning Models” (together with Himabindu Lakkaraju, Julius Adebayo, Jacob Steinhardt, D. Sculley, Rich Caruana)
- Invited Session “New Advances in Causal Inference for Longitudinal and Survival Data” at International Conference on Health Policy Statistics (ICHPS) 2018 (together with Michael Elliott and James O’Malley)
- Topic-Contributed Session “Statistics for Social Good” at JSM 2016 (together with Rayid Ghani and Hadley Wickham
- 2016 WiML Workshop (together with Diana Cai, Deborah Hanus, Isabel Valera, Rose Yu). WiML Workshop has grown tremendously, and the year I organized, it had 600 attendees and 200 posters. I am most proud of the mentoring roundtables format we expanded that year, with 50 roundtables on research and career topics bringing together our attendees and experts in close conversation
Mentor:
- Reviewing: Machine Learning for Health Workshop
- Submission: AI for Public Health Workshop
- Project: UCSF’s AI4ALL program
- Research: Causal Inference research roundtable at 2021 WiML Workshop
- Seeking funding: 2018, 2019, 2020 WiML Workshop
Student representative, ICHPS 2018 Scientific Committee
Cornell internal: I was president of the Statistics Graduate Society and co-organized (together with Ashudeep Singh) the Cornell Machine Learning reading group

Miscellaneous

I played piano and (bad) ukulele in an Indian fusion carnatic band. We have some videos here