Sarah Tan Hui Fen (Sarah) Tan
Google Scholar

I am a research scientist in Responsible AI, working on fairness and experimentation topics. I am also interested in causal inference and interpretability. I received my PhD in Statistics from Cornell University, where I was advised by Giles Hooker and Martin Wells, with Thorsten Joachims and Rich Caruana on my committee. I co-founded the Trustworthy ML Initiative.

Previously, I studied at Berkeley and Columbia, and worked in public policy in NYC, including the health department and public hospitals system. I was also a Data Science for Social Good fellow. I was fortunate to spend summers at Microsoft Research, working with Rich Caruana, Kori Inkpen, and Ece Kamar. Towards the end of my PhD studies, I was a visiting student and bioinformatics programmer at UCSF medical school.


I’m currently based in Seattle. You can reach me at


  • 1/23: Have a new preprint coming out on “Error Discovery By Clustering Influence Embeddings”, with Fulton Wang, Julius Adebayo, Diego Garcia-Olano, and Narine Kokhlikyan. It was fun to work on comparing influence embeddings and CLIP embeddings on different neural net architectures. We are excited to share our preprint soon!
  • 1/23: I have been elected president of the Women in Machine Learning organization (WiML).
  • 1/23: I will be the Tutorial Chair for FAccT 2023.
  • 9/21: I will be the Diversity & Inclusion Chair for AISTATS 2022.
  • 6/21: I will be a discussant at the International Seminar on Selective Inference. Looking forward to discussing model distillation!
  • 2/21: The gradient boosted tree distance we propose in this paper has gotten some interest. Here is some code that illustrates how to calculate it.
  • 10/19: I gave a talk at Data & Society’s Meeting on Fair ML in Health about risk scoring models in healthcare.
  • 7/19: I had a blast helping out with UCSF’s AI4ALL program! I presented on dataset bias and helped mentor a project team using electronic medical records to predict opioid overdose and other conditions.
  • 4/19: I will be giving a talk at Columbia’s Data Science Institute in the Data for Good seminar series.
  • 3/19: I’m excited to help teach the new ML for biomedicine course at UCSF. This is perhaps the first official ML course at UCSF and I’m looking forward to teaching again!
  • 12/18: I’m co-organizing a workshop at ICLR 2019 on Debugging ML models. Submit your paper or demo!
  • 6/18: Honored to receive a Microsoft Research Dissertation Grant.
  • 6/17: Grateful to receive the American Statistical Association Wray Jackson Smith Award
  • 3/17: My project evaluating the impact of later school start times in NYC public schools has received an Engaged Cornell grant. You can read more about it here and here.

For other news, click here.

Code & Data

Publications and Preprints

Journal and Conference Papers


For older publications and posters, click here.



  • I played piano and (bad) ukulele in an Indian fusion carnatic band. We have some videos here