About

I’m a CS PhD at Stanford working on scalable oversight; information-theoretic mechanisms to evaluate AI systems without ground truth. I work end-to-end: proofs → benchmarks → pilots.

Highlights

  • Funding: Lead on OpenAI Superalignment Fast Grant ($500k, 2024).
  • Result: Info-theoretic framework for black-box LLM evaluation without ground truth with 10–100× robustness to adversarial manipulation vs. prior methods (arXiv:2508.05469, Aug 2025).
  • Adoption/recognition: Initial production deployments (LinkedIn); invited lightning talk at Building an Aligned AI Future (Fifty Years @ Stanford, Sept 2025) (event).

Research Program

Theory. We link incentive-compatible scoring rules to f-mutual information, showing bounded f-divergences (e.g., TVD) maintain polynomial robustness under strategic attacks while unbounded ones (e.g., KL) can degrade exponentially.

Experiments/Implementation. We instantiate these mechanisms for LLM evals without ground truth, where querying for information relationships (not quality judgments) yields robust item-level scores. In practice, TVD-MI achieves strong AUC under attack; code + preregistration released.

Field response. Within weeks of the arXiv release, several labs initiated alternative evaluation platforms—underscoring urgency of this direction. Feedback from mechanism-design experts (e.g., Yuqing Kong).

Adoption & Recognition

  • ICML 2024 highlight; early production deployments (LinkedIn)
  • Invited lightning talk: Paying for Information, Not Vibes at Building an Aligned AI Future (Fifty Years @ Stanford, Sept 2025) (event)
  • Open science: preregistration (OSF), code, and complete proofs

Research Leadership & Funding

  • Lead & primary researcher, OpenAI Superalignment Fast Grant ($500k, 2024): conceived agenda; executed from proposal to validation
  • Advocacy: Led WellLabeled (Stanford HAI) on data-annotation worker rights; interviews with Turkopticon, Scale AI, and OpenAI; featured by Stanford HAI (article + video, Jul 2024); invited talk at FloodGate
  • First-author theoretical and empirical work connecting mechanism design to ML evaluation

Talks & Public Engagement

  • Paying for Information, Not Vibes. Lightning talk, Building an Aligned AI Future (Fifty Years @ Stanford), Sept 2025
  • Invited talks (selected): Max Planck Institute for Intelligent Systems; FloodGate (2024)

Background (select)

  • Theory: BS, Computational & Applied Math (UChicago)
  • Systems: MS, CS (UIUC): metric elicitation & evaluation systems
  • Industry: Google (ads welfare optimization); Lam Research (RL for semiconductor process control)
  • Research: Toyota Technological Institute (perception & language grounding)

News

  • Sept 2025: Invited lightning talk at Building an Aligned AI Future (Fifty Years @ Stanford) — event
  • Aug 2025: First-author paper on evaluation mechanisms featured on arXiv front page
  • July–Dec 2024: Invited talks: Max Planck Institute for Intelligent Systems; FloodGate
  • April 2024: OpenAI Superalignment Fast Grant awarded ($500k)
  • 2023–2024: Led WellLabeled advocacy group on data-annotation worker rights at Stanford HAI

Contact

zroberts@stanford.edu

CV · Google Scholar · GitHub

Last updated: September 2025.

Publications

  1. Zachary Robertson and Sanmi Koyejo. Let’s Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes. arXiv:2508.05469, 2025
  2. Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, and Sanmi Koyejo. Measurement to Meaning: A Validity-Centered Framework for AI Evaluation. arXiv:2505.10573, 2025
  3. Zachary Robertson and Sanmi Koyejo. Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks. ICML, 2024
  4. Zachary Robertson, Hannah Cha, Andrew Sheha, and Sanmi Koyejo. Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models. ICML 2024 Workshop on Theoretical Foundations of Foundation Models
  5. Zachary Robertson and Sanmi Koyejo. No Bidding, No Regret: Pairwise-Feedback Mechanisms for Digital Goods and Data Auctions. ISIT Workshop on Information-Theoretic Methods for Trustworthy Machine Learning, 2024
  6. Boxiang Lyu, Zhe Feng, Zachary Robertson, and Sanmi Koyejo. Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions. ICML, 2023
  7. Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Cooperative Inverse Decision Theory for Uncertain Preferences. AISTATS, 2023
  8. Zachary Robertson. GPT4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study. arXiv preprint arXiv:2307.05492, 2023
  9. Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Probabilistic Performance Metric Elicitation. 1st Workshop on Human and Machine Decisions (WHMD 2021) at NeurIPS 2021, 2022
  10. Zachary Robertson and Matthew Walter. Concurrent Training Improves the Performance of Behavioral Cloning from Observation. arXiv preprint arXiv:2008.01205, 2020
  11. Julian Stürmer, Andreas Seifahrt, Zachary Robertson, Christian Schwab, and Jacob L Bean. Echelle++, a Fast Generic Spectrum Simulator. Publications of the Astronomical Society of the Pacific, 131(996):024502, 2018