Sarah Tan

Hui Fen (Sarah) Tan
Google Scholar
LinkedIn
Github

I am a researcher interested in AI safety, causal inference, interpretability, and healthcare. Currently, I am a Principal Research Scientist at Salesforce. I also hold a Visiting Scientist appointment at Cornell University in the College of Computing and Information Science.

I received my PhD in Statistics from Cornell University, where I was advised by Giles Hooker and Martin Wells, with Thorsten Joachims and Rich Caruana on my committee. My dissertation was on the topic of interpretability of black-box AI models.

Previously, I studied at Berkeley and Columbia, and worked in public policy in NYC, including the health department and public hospitals system. I was fortunate to spend summers at Microsoft Research. Towards the end of my PhD studies, I was a visiting student and bioinformatics programmer at UCSF medical school. I joined Facebook after completing my PhD, and worked in Central Applied Science before moving to Responsible AI. I'm also interested in startups, stemming from my experience as a data scientist (part of the founding team) at a NLP startup pre-PhD.

Contact

You can reach me at ht395 AT cornell.edu.

News

9/24: Gave a guest lecture for the University of Southern California’s ENG 499 “Ethics in Engineering Design of AI Systems” class.
8/24: Representing Salesforce on a US AI Safety Institute task force.
8/24: Co-organizing 2nd edition of Regulatable ML workshop at NeurIPS 2024. Submit your paper!
5/24: Did a fireside chat in the University of Colorado Denver’s PUAD 6600 “AI for Public Sector Innovation” class.
9/23: Was a panelist at Columbia Business School’s Challenges in Operationalizing Responsible AI workshop.
1/23: I will be the Tutorial Chair for FAccT 2023.

For other news, click here.

Code & Data

Code for gradient boosted trees distance proposed in tree space prototypes paper
R package surfin: (Statistical Inference for Random Forests)
Data and code for distilling black-box risk scores paper

Publications and Preprints

Journal and Conference Papers

Evaluating Cultural and Social Awareness of LLM Web Agents
- H Qiu, AR Fabbri, D Agarwal, KH Huang, Tan, N Peng, CS Wu
- NAACL 2025
- Also appeared in: SoCal NLP Symposium 2024 and C3NLP 2025 Workshop
Error Discovery By Clustering Influence Embeddings
- F Wang, J Adebayo, Tan, D Garcia-Olano, N Kokhlikyan
- NeurIPS 2023
- Also appeared in: ICLR 2023 Pitfalls of limited data and computation for Trustworthy ML Workshop (Oral)
Missing Values and Imputation in Healthcare Data: Can Interpretable Machine Learning Help?
- Z Chen, Tan, U Chajewska, C Rudin, R Caruana
- CHIL 2023
Considerations When Learning Additive Explanations for Black-Box Models
- Tan, G Hooker, P Koch, A Gordo, R Caruana
- Machine Learning 2023
- Also appeared in: NeurIPS 2018 Machine Learning for Health Workshop
Interpretable Personalized Experimentation
- H Wu, Tan, W Li, M Garrard, A Obeng, D Dimmery, S Singh, H Wang, D Jiang, E Bakshy
- KDD 2022
- Also appeared in: Conference on Digital Experimentation 2021 (Oral)
How Interpretable and Trustworthy are GAMs?
- CH Chang, Tan, B Lengerich, A Goldenberg, R Caruana
- KDD 2021
Do I Look Like a Criminal? Examining the Impact of Racial Information on Human Judgement
- K Mallari, K Inkpen, P Johns, Tan, D Ramesh, E Kamar
- CHI 2020
Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable
- Tan, M Soloviev, G Hooker, M Wells
- ACM-IMS FODS 2020
- Also appeared in: NIPS 2016 Interpretability Workshop
- Code
Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
- B Lengerich, Tan, CH Chang, G Hooker, R Caruana
- AISTATS 2020
Axiomatic Interpretability for Multiclass Additive Models
- X Zhang, Tan, P Koch, Y Lou, U Chajewska, R Caruana
- KDD 2019 (Oral)
- Video
Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation
- Tan, R Caruana, G Hooker, Y Lou
- AIES 2018 (Oral)
- Also appeared in: NIPS 2017 Interpretability Symposium (Spotlight), NIPS 2017 Transparent Machine Learning in Safety Critical Environments Workshop (Spotlight)
- Media coverage: MIT Technology Review, Politico, Futurism, WorkFlow
- Code and data
A Bayesian Evidence Synthesis Approach to Estimate Disease Prevalence in Hard-To-Reach Populations: Hepatitis C in New York City
- Tan, S Makela, D Heller, K Konty, S Balter, T Zheng, J Stark
- Epidemics 2018
- Presented to NYC Health Commissioner. Talk at NDRI
- Code
“No Fracking Way!” Documentary Film, Discursive Opportunity, and Local Opposition against Hydraulic Fracturing in the United States, 2010 to 2013
- I Vasi, E Walker, JS Johnson, Tan
- American Sociological Review 2015
- 2 Best Paper Awards from American Sociological Association’s CITAMS and CBSM sections
- Media coverage: The Guardian, The Atlantic, Pacific Standard
- Press releases: University of Iowa, Harmony Institute

Preprints

Generative Models, Humans, Predictive Models: Who Is Worse at High-Stakes Decision Making?
- K Mallari, J Adebayo, K Inkpen, MT Wells, A Gordo, Tan
- Under review
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
- R Murthy, L Yang, J Tan, T Awalgaonkar, Y Zhou, S Heinecke, S Desai, J Wu, R Xu, Tan, J Zhang, Z Liu, S Kokane, Z Liu, M Zhu, H Wang, C Xiong, S Savarese
- Under review
XForecast: Evaluating Natural Language Explanations for Time Series Forecasting
- T Aksu, C Liu, A Saha, Tan, C Xiong, D Sahoo
- Under review

For older publications and workshop papers, click here.

Service

Area Chair, NeurIPS 2025
Area Chair, FAccT 2023-2025
Area Chair, CHIL 2024-2025
Area Chair, Machine Learning for Health Symposium 2020-2023