About

I’m a CS PhD at Stanford working on scalable oversight; information-theoretic mechanisms to evaluate AI systems without ground truth. I work end-to-end: proofs → benchmarks → pilots.

Highlights

Funding: Lead on OpenAI Superalignment Fast Grant ($500k, 2024).
Result: Info-theoretic framework for black-box LLM evaluation without ground truth with 10–100× robustness to adversarial manipulation vs. prior methods (arXiv:2508.05469, Aug 2025).
Adoption/recognition: Initial production deployments (LinkedIn); invited lightning talk at Building an Aligned AI Future (Fifty Years @ Stanford, Sept 2025) (event).

Research Program

Theory. We link incentive-compatible scoring rules to f-mutual information, showing bounded f-divergences (e.g., TVD) maintain polynomial robustness under strategic attacks while unbounded ones (e.g., KL) can degrade exponentially.

Experiments/Implementation. We instantiate these mechanisms for LLM evals without ground truth, where querying for information relationships (not quality judgments) yields robust item-level scores. In practice, TVD-MI achieves strong AUC under attack; code + preregistration released.

Field response. Within weeks of the arXiv release, several labs initiated alternative evaluation platforms—underscoring urgency of this direction. Feedback from mechanism-design experts (e.g., Yuqing Kong).

Adoption & Recognition

ICML 2024 highlight; early production deployments (LinkedIn)
Invited lightning talk: Paying for Information, Not Vibes at Building an Aligned AI Future (Fifty Years @ Stanford, Sept 2025) (event)
Open science: preregistration (OSF), code, and complete proofs

Research Leadership & Funding

Lead & primary researcher, OpenAI Superalignment Fast Grant ($500k, 2024): conceived agenda; executed from proposal to validation
Advocacy: Led WellLabeled (Stanford HAI) on data-annotation worker rights; interviews with Turkopticon, Scale AI, and OpenAI; featured by Stanford HAI (article + video, Jul 2024); invited talk at FloodGate
First-author theoretical and empirical work connecting mechanism design to ML evaluation

Talks & Public Engagement

Paying for Information, Not Vibes. Lightning talk, Building an Aligned AI Future (Fifty Years @ Stanford), Sept 2025
Invited talks (selected): Max Planck Institute for Intelligent Systems; FloodGate (2024)

Background (select)

Theory: BS, Computational & Applied Math (UChicago)
Systems: MS, CS (UIUC): metric elicitation & evaluation systems
Industry: Google (ads welfare optimization); Lam Research (RL for semiconductor process control)
Research: Toyota Technological Institute (perception & language grounding)

News

Sept 2025: Invited lightning talk at Building an Aligned AI Future (Fifty Years @ Stanford) — event
Aug 2025: First-author paper on evaluation mechanisms featured on arXiv front page
July–Dec 2024: Invited talks: Max Planck Institute for Intelligent Systems; FloodGate
April 2024: OpenAI Superalignment Fast Grant awarded ($500k)
2023–2024: Led WellLabeled advocacy group on data-annotation worker rights at Stanford HAI

Contact

zroberts@stanford.edu

CV · Google Scholar · GitHub

Last updated: September 2025.

Publications

Zachary Robertson and Sanmi Koyejo. Let’s Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes. arXiv:2508.05469, 2025
Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, and Sanmi Koyejo. Measurement to Meaning: A Validity-Centered Framework for AI Evaluation. arXiv:2505.10573, 2025
Zachary Robertson and Sanmi Koyejo. Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks. ICML, 2024
Zachary Robertson, Hannah Cha, Andrew Sheha, and Sanmi Koyejo. Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models. ICML 2024 Workshop on Theoretical Foundations of Foundation Models
Zachary Robertson and Sanmi Koyejo. No Bidding, No Regret: Pairwise-Feedback Mechanisms for Digital Goods and Data Auctions. ISIT Workshop on Information-Theoretic Methods for Trustworthy Machine Learning, 2024
Boxiang Lyu, Zhe Feng, Zachary Robertson, and Sanmi Koyejo. Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions. ICML, 2023
Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Cooperative Inverse Decision Theory for Uncertain Preferences. AISTATS, 2023
Zachary Robertson. GPT4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study. arXiv preprint arXiv:2307.05492, 2023
Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Probabilistic Performance Metric Elicitation. 1st Workshop on Human and Machine Decisions (WHMD 2021) at NeurIPS 2021, 2022
Zachary Robertson and Matthew Walter. Concurrent Training Improves the Performance of Behavioral Cloning from Observation. arXiv preprint arXiv:2008.01205, 2020
Julian Stürmer, Andreas Seifahrt, Zachary Robertson, Christian Schwab, and Jacob L Bean. Echelle++, a Fast Generic Spectrum Simulator. Publications of the Astronomical Society of the Pacific, 131(996):024502, 2018