I’m a CS PhD at Stanford working on scalable oversight; information-theoretic mechanisms to evaluate AI systems without ground truth. I work end-to-end: proofs → benchmarks → pilots.
Highlights
- Funding: Lead on OpenAI Superalignment Fast Grant ($500k, 2024).
- Result: Info-theoretic framework for black-box LLM evaluation without ground truth with 10–100× robustness to adversarial manipulation vs. prior methods (arXiv:2508.05469, Aug 2025).
- Adoption/recognition: Initial production deployments (LinkedIn); invited lightning talk at Building an Aligned AI Future (Fifty Years @ Stanford, Sept 2025) (event).
Research Program
Theory. We link incentive-compatible scoring rules to f-mutual information, showing bounded f-divergences (e.g., TVD) maintain polynomial robustness under strategic attacks while unbounded ones (e.g., KL) can degrade exponentially.
Experiments/Implementation. We instantiate these mechanisms for LLM evals without ground truth, where querying for information relationships (not quality judgments) yields robust item-level scores. In practice, TVD-MI achieves strong AUC under attack; code + preregistration released.
Field response. Within weeks of the arXiv release, several labs initiated alternative evaluation platforms—underscoring urgency of this direction. Feedback from mechanism-design experts (e.g., Yuqing Kong).
Adoption & Recognition
- ICML 2024 highlight; early production deployments (LinkedIn)
- Invited lightning talk: Paying for Information, Not Vibes at Building an Aligned AI Future (Fifty Years @ Stanford, Sept 2025) (event)
- Open science: preregistration (OSF), code, and complete proofs
Research Leadership & Funding
- Lead & primary researcher, OpenAI Superalignment Fast Grant ($500k, 2024): conceived agenda; executed from proposal to validation
- Advocacy: Led WellLabeled (Stanford HAI) on data-annotation worker rights; interviews with Turkopticon, Scale AI, and OpenAI; featured by Stanford HAI (article + video, Jul 2024); invited talk at FloodGate
- First-author theoretical and empirical work connecting mechanism design to ML evaluation
Talks & Public Engagement
- Paying for Information, Not Vibes. Lightning talk, Building an Aligned AI Future (Fifty Years @ Stanford), Sept 2025
- Invited talks (selected): Max Planck Institute for Intelligent Systems; FloodGate (2024)
Background (select)
- Theory: BS, Computational & Applied Math (UChicago)
- Systems: MS, CS (UIUC): metric elicitation & evaluation systems
- Industry: Google (ads welfare optimization); Lam Research (RL for semiconductor process control)
- Research: Toyota Technological Institute (perception & language grounding)
News
- Sept 2025: Invited lightning talk at Building an Aligned AI Future (Fifty Years @ Stanford) — event
- Aug 2025: First-author paper on evaluation mechanisms featured on arXiv front page
- July–Dec 2024: Invited talks: Max Planck Institute for Intelligent Systems; FloodGate
- April 2024: OpenAI Superalignment Fast Grant awarded ($500k)
- 2023–2024: Led WellLabeled advocacy group on data-annotation worker rights at Stanford HAI
Contact
CV · Google Scholar · GitHub
Last updated: September 2025.
Publications
- Zachary Robertson and Sanmi Koyejo. Let’s Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes. arXiv:2508.05469, 2025
- Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, and Sanmi Koyejo. Measurement to Meaning: A Validity-Centered Framework for AI Evaluation. arXiv:2505.10573, 2025
- Zachary Robertson and Sanmi Koyejo. Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks. ICML, 2024
- Zachary Robertson, Hannah Cha, Andrew Sheha, and Sanmi Koyejo. Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models. ICML 2024 Workshop on Theoretical Foundations of Foundation Models
- Zachary Robertson and Sanmi Koyejo. No Bidding, No Regret: Pairwise-Feedback Mechanisms for Digital Goods and Data Auctions. ISIT Workshop on Information-Theoretic Methods for Trustworthy Machine Learning, 2024
- Boxiang Lyu, Zhe Feng, Zachary Robertson, and Sanmi Koyejo. Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions. ICML, 2023
- Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Cooperative Inverse Decision Theory for Uncertain Preferences. AISTATS, 2023
- Zachary Robertson. GPT4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study. arXiv preprint arXiv:2307.05492, 2023
- Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Probabilistic Performance Metric Elicitation. 1st Workshop on Human and Machine Decisions (WHMD 2021) at NeurIPS 2021, 2022
- Zachary Robertson and Matthew Walter. Concurrent Training Improves the Performance of Behavioral Cloning from Observation. arXiv preprint arXiv:2008.01205, 2020
- Julian Stürmer, Andreas Seifahrt, Zachary Robertson, Christian Schwab, and Jacob L Bean. Echelle++, a Fast Generic Spectrum Simulator. Publications of the Astronomical Society of the Pacific, 131(996):024502, 2018