I am a CS PhD student working with Sanmi Koyejo in STAIR lab, where we are working on trustworthy AI. My research focuses on developing scalable oversight mechanisms and aligning AI systems with human preferences, drawing from mechanism design, information theory, and complex systems to create principled frameworks for human-AI collaboration.
Recently, I developed information-theoretic mechanisms for evaluating AI systems without ground truth, proving that f-mutual information measures uniquely resist gaming while maintaining high discrimination between faithful and deceptive agents. This work demonstrates 10-100× better robustness to adversarial manipulation than current LLM judges and has attracted significant attention from the research community. See our recent work featured on arXiv’s front page, or watch this 2-minute explanation that caught the attention of AI industry leaders.
Prior to Stanford, I obtained a masters in CS from UIUC and a bachelor’s in Computational and Applied Mathematics from University of Chicago. I have gained diverse research experience through internships at Google, where I designed tractable surrogates for welfare maximization with applications in ad click-through-rate prediction; Lam Research, optimizing semiconductor wafer production using reinforcement learning; and the Robot Intelligence through Perception Lab at TTIC, working on sparse-depth completion and natural language instruction following for robotic manipulation.
Contact me
Recent Highlights
- Paper on LLM evaluation mechanisms featured on arXiv ML front page (70k+ views)
- Lead author of successful OpenAI Superalignment Fast Grant ($500k)
- Invited talks at Max Planck Institute for Intelligent Systems and FloodGate
Publications
- Zachary Robertson and Sanmi Koyejo. Let’s Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes. arXiv:2508.05469, 2025
- Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, and Sanmi Koyejo. Measurement to Meaning: A Validity-Centered Framework for AI Evaluation. arXiv:2505.10573, 2025
- Zachary Robertson and Sanmi Koyejo. Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks. ICML, 2024
- Zachary Robertson, Hannah Cha, Andrew Sheha, and Sanmi Koyejo. Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models. ICML 2024 Workshop on Theoretical Foundations of Foundation Models
- Zachary Robertson and Sanmi Koyejo. No Bidding, No Regret: Pairwise-Feedback Mechanisms for Digital Goods and Data Auctions. ISIT Workshop on Information-Theoretic Methods for Trustworthy Machine Learning, 2024
- Boxiang Lyu, Zhe Feng, Zachary Robertson, and Sanmi Koyejo. Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions. ICML, 2023
- Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Cooperative Inverse Decision Theory for Uncertain Preferences. AISTATS, 2023
- Zachary Robertson. GPT4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study. arXiv preprint arXiv:2307.05492, 2023
- Zachary Robertson, Hantao Zhang, and Sanmi Koyejo. Probabilistic Performance Metric Elicitation. 1st Workshop on Human and Machine Decisions (WHMD 2021) at NeurIPS 2021, 2022
- Zachary Robertson and Matthew Walter. Concurrent Training Improves the Performance of Behavioral Cloning from Observation. arXiv preprint arXiv:2008.01205, 2020
- Julian Stürmer, Andreas Seifahrt, Zachary Robertson, Christian Schwab, and Jacob L Bean. Echelle++, a Fast Generic Spectrum Simulator. Publications of the Astronomical Society of the Pacific, 131(996):024502, 2018