77
v1v2v3 (latest)

Aletheia tackles FirstProof autonomously

Tony Feng
Junehyuk Jung
Sang-hyun Kim
Carlo Pagano
Sergei Gukov
Chiang-Chiang Tsai
David Woodruff
Adel Javanmard
Aryan Mokhtari
Dawsen Hwang
Yuri Chervonyi
Jonathan N. Lee
Garrett Bingham
Trieu H. Trinh
Vahab Mirrokni
Quoc V. Le
Thang Luong
Main:6 Pages
Bibliography:1 Pages
4 Tables
Appendix:34 Pages
Abstract

We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available atthis https URL.

View on arXiv
Comments on this paper