About

I study how to evaluate AI systems when ground truth is unavailable. My current focus is mutual evaluation: using interactions between agents to measure reliability without labeled answers. More broadly, I’m interested in how human preferences, stated and revealed, get encoded into agents and ranking systems, and where those processes break down.

I work across theory and practice. I contributed early work on chain-of-thought prompting, a technique for eliciting step-by-step reasoning in language models. In later reporting on the technique’s early independent origins, The Atlantic described me as a “co-inventor.” I continue maintaining one of the larger human-generated tree-of-thought systems for studying reasoning and evaluation. I’ve also used the system to contribute to early reasoning data work for frontier models. The through-line is my interest in how illegible human signals become legible data for training machine behavior.

Themes

Scalable oversight: Information-theoretic evaluation of AI systems without ground truth. Current focus is mutual evaluation, using model interactions to measure reliability without labeled answers, along with theoretical foundations for robustness to adversarial manipulation (arXiv:2508.05469, Aug 2025). Supported by an OpenAI Superalignment Fast Grant ($500k, 2024).

Human signal construction: Methods and systems for eliciting, structuring, and validating human signals used to train and evaluate models. This includes early work on chain-of-thought prompting and reasoning decomposition, as well as research on annotation pipelines, red teaming, and aligning ranking systems (e.g., social media feeds) with users’ articulated values.

News & Engagement

  • Apr 2026: Featured in The Atlantic: The Strange Origin of AI’s ‘Reasoning’ Abilities.
  • Apr 2026: Coauthor on Value Alignment of Social Media Ranking Algorithms (CHI 2026), a generalizable method for aligning social media feeds to users’ values, showing with 400+ users that value-driven rankings produce meaningfully aligned feeds using their own X/Twitter data.
  • Sept 2025: Invited talk at Building an Aligned AI Future (Fifty Years @ Stanford): event.
  • Aug 2025: First-author on Let’s Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes, a paper on robust evaluation without ground truth: arXiv:2508.05469. Front-page.
  • July 2024: Featured by Stanford HAI for WellLabeled, a cross-disciplinary project on ethical challenges in AI data annotation and worker protections: article + video.
  • July 2024: Invited talks on scalable oversight including FloodGate and Max Planck Institute for Intelligent Systems.
  • July 2024: Early production deployments related to scalable evaluation infrastructure: LinkedIn post.
  • May 2024: Lead on OpenAI Superalignment Fast Grant ($500k).

Experience

  • Stanford University 2022-present: PhD student in Computer Science working on scalable oversight, AI evaluation, and alignment.
  • University of Illinois Urbana-Champaign 2020-2022: MS in Computer Science, with work on metric elicitation and evaluation systems. GEM fellowship recipient.
  • University of Chicago 2016-2020: BS in Computational & Applied Mathematics. Jackie Robinson scholarship recipient.
  • Google 2022: Industry experience in ads welfare optimization.
  • Lam Research 2020-2021: Applied reinforcement learning to semiconductor process control.
  • Toyota Technological Institute 2019-2020: Research experience in perception and language grounding.
  • Stackexchange: Over 500k+ readers on math and stackexchange have been helped by my questions and answers. Profile.

Contact

zroberts@stanford.edu

CV · Google Scholar · GitHub · X/Twitter · LinkedIn

Last updated: April 2026.

Selected Publications

  1. Farnaz Jahanbakhsh, Dora Zhao, Tiziano Piccardi, Zachary Robertson, Ziv Epstein, Sanmi Koyejo, and Michael S. Bernstein. Value Alignment of Social Media Ranking Algorithms. ACM CHI, 2026.
  2. Zachary Robertson and Sanmi Koyejo. Let’s Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes. arXiv:2508.05469, 2025
  3. Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, and Sanmi Koyejo. Measurement to Meaning: A Validity-Centered Framework for AI Evaluation. arXiv:2505.10573, 2025
  4. Zachary Robertson and Sanmi Koyejo. Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks. ICML, 2024
  5. Zachary Robertson, Hannah Cha, Andrew Sheha, and Sanmi Koyejo. Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models. ICML 2024 Workshop on Theoretical Foundations of Foundation Models
  6. Zachary Robertson and Sanmi Koyejo. No Bidding, No Regret: Pairwise-Feedback Mechanisms for Digital Goods and Data Auctions. ISIT Workshop on Information-Theoretic Methods for Trustworthy Machine Learning, 2024
  7. Boxiang Lyu, Zhe Feng, Zachary Robertson, and Sanmi Koyejo. Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions. ICML, 2023
  8. Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Cooperative Inverse Decision Theory for Uncertain Preferences. AISTATS, 2023
  9. Zachary Robertson. GPT4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study. arXiv preprint arXiv:2307.05492, 2023
  10. Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Probabilistic Performance Metric Elicitation. 1st Workshop on Human and Machine Decisions (WHMD 2021) at NeurIPS 2021, 2022
  11. Zachary Robertson and Matthew Walter. Concurrent Training Improves the Performance of Behavioral Cloning from Observation. arXiv preprint arXiv:2008.01205, 2020