Michael Saxon

I study generative AI artifacts like LLMs and text-to-image models. I'm particularly interested in making meaningful evaluations of hard-to-measure new capabilities in these artifacts. I think ethical issues in GenAI are both important to address and motivate interesting new technical challenges.

Lately I've been thinking a lot about automated metrics for text-to-image systems. Unlike in the text domain, in continuous domains, characterizing things like "knowledge" in outputs by comparing to references is a challenge. My recent work "CoCo-CroLa" was a start in assessing multilingual knowledge for T2Is.


Intern, AMD Research, Open GenAI (2024)


PhD student, NLP Lab, Computer Science, UC Santa Barbara (2020–)
Advised by William Yang Wang
Recipient, NSF Graduate Research Fellowship (2020)


Intern, Meta AI (Cognitive AI/Conversational AI Research) (2022)
Intern, Amazon Alexa Web-based QA (2021)
Intern, Amazon Alexa Hybrid Science (2019, 2020)

MS Computer Engineering, Arizona State University (2018–2020)
Advised by Visar Berisha & Sethuraman Panchanathan

BSE Electrical Engineering, Arizona State University (2014–2018)


5/4/2024 Had a great week visiting UMBC, Georgetown, UMD, and Johns Hopkins to present my work on rigorous measurement for text-to-image models. Check out the talk recording! [YouTube]

4/8/2024 We were suprised to find that fancy VLM-based text-to-image faithfulness metrics don't actually outperform simple correlation-based ones using our new meta-evaluation T2IScoreScore! Check out our interactive leaderboard! [arXiv:2404.04251]

3/18/2024 Excited about our work diving deeper on characterizing failure cases in translation for text-to-image model testing! [arXiv:2403.11092]

12/10/2023 Wanrong presented our (mostly Xinyi's) work analyzing how in-context learning works for LLMs at NeurIPS 2023! Check out the paper: [arXiv:2301.11916]

11/16/2023 Gave an invited talk on my recent work on assessing and improving multilingual knowledge and capabilities in T2I models at USC ISI! Video link: [YouTube]

11/3/2023 Our dataset/paper proposing the task of "video infilling and prediction" for assessing reasoning capabilities of VLMs has been accepted to EMNLP! Link: [arXiv:2305.13903]

8/6/2023 Our survey paper on self-correcting and automated correction methods for LLM workflows, "Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies" is up on arXiv! Link: [arXiv:2308.03188]

7/13/2023 Presented CoCo-CroLa in the paper "Multilingual Conceptual Coverage in Text-to-Image Models" at ACL 2023! See the benchmark demo [demo link] and paper in ACL Anthology: [ACL Anthology]

6/20/2023 Gave a talk at FAccT 2023 on CoCo-CroLa and our initial findings of interesting cross-lingual biases. Watch the talk! [YouTube]

5/8/2023 Presenting at ICLR 2023 was a blast! Check out the new dataset we presented in an oral, WikiWhy: [arXiv:2210.12152]

3/9/2023 Check out me and Alex's position paper, "Users are the North Star for AI Transparency," written in collaboration with our advisor William, and profs Shiyu Chang and Zack Lipton! You'll probably like it more than the IJCAI reviewers did 😉 Preprint: [arXiv:2303.05500]

1/23/2023 2 of my papers were accepted to ICLR 2023 and one to EACL 2023! In particular I'm happy to share that WikiWhy, a new benchmark for analyzing reasoning in LMs using QA got accepted as an oral presentation! Super proud of my undergrad group to get such an honor at ICLR for their first paper! Preprint here: [arXiv:2210.12152]

12/20/2022 Check out my preprint "Multilingual Conceptual Coverage in Text-to-Image Models" on OpenReview! We quantify the degree to which T2I models including DALL-E and StableDiffusion contain representations of ~200 tangible concepts across EN, ES, DE, ZH, JA, HE, and ID. Preprint here: [OpenReview:5H2m3tCEaQ] Demo available here: [demo link]

11/18/2022 The 2022 Southern California NLP Symposium (SoCalNLP22) was a massive success! Co-chairing the program committee and participating in event organization was a great privilege and it was wonderful meeting everybody. Please check out our full event livestream [YouTube Link] and some event photos [Twitter:@m2saxon] [Twitter:@ucsbNLP]!

10/24/2022 Our general-purpose text-reference comparison metric that simulates human preferences for translation and summarization, SEScore, is now available on HuggingFace spaces! Preprint here: [arXiv:2210.05035]

10/12/2022 Our work "Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis" will appear in Findings of EMNLP 2022! Preprint here: [arXiv:2210.05035]

10/12/2022 Check out the latest preprint of my work "PECO: Examining Single Sentence Label Leakage in Natural Language Inference Datasets through Progressive Evaluation of Cluster Outliers" on arXiv! We demonstrated automated detection of spurious, annotator-driven correlations that lead to cheating features in NLI. Preprint here: [arXiv:2112.09237]

6/6/2022 Excited to start my 2022 AI Research Scientist Internship at Meta in Menlo Park!

12/3/2021 Our work "Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer" will appear at AAAI 2022! Preprint here: [arXiv:2110.02950]

11/8/2021 Had a great time presenting our Disclosive Transparency work at EMNLP 2021! Our work was even highlighted in an EMNLP overview article! Oral presentation prerecording: [YouTube]

10/1/2021 Our work "Counterfactual Maximum Likelihood Estimation for Training Deep Networks" will appear at NeurIPS 2021! Preprint here: [arXiv:2106.03831]

9/23/2021 Our work "Modeling Disclosive Transparency in NLP Application Descriptions" will appear at EMNLP 2021 as an oral presentation! Preprint here: [arXiv:2101.00433]

9/13/2021 I was profiled on the Amazon Science Blog about my experience doing multiple applied science internships with the company! Article here: [amazon.science]