ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.21934
  4. Cited By
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

27 March 2025
Ivo Petrov
Jasper Dekoninck
Lyuben Baltadzhiev
Maria Drencheva
Kristian Minchev
Mislav Balunović
Nikola Jovanović
Martin Vechev
    LRM
    ELM
ArXivPDFHTML

Papers citing "Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad"

9 / 9 papers shown
Title
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
Haoyu Zhao
Yihan Geng
Shange Tang
Yong Lin
Bohan Lyu
Hongzhou Lin
Chi Jin
Sanjeev Arora
5
0
0
19 May 2025
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
Guijin Son
Jiwoo Hong
Honglu Fan
Heejeong Nam
Hyunwoo Ko
...
Jinyeop Song
Jinha Choi
Gonçalo Paulo
Youngjae Yu
Stella Biderman
0
0
0
17 May 2025
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?
Pedro Orvalho
Marta Kwiatkowska
LRM
ELM
34
0
0
15 May 2025
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
Liam Boyle
Nicolas Baumann
Paviththiren Sivasothilingam
Michele Magno
Luca Benini
LM&Ro
LRM
51
0
0
06 May 2025
Phi-4-reasoning Technical Report
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
87
1
0
30 Apr 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yizhou Sun
Zeyu Cai
...
Ming-xing Luo
Muhan Zhang
Yaodong Yang
Muhan Zhang
Hua Xing Zhu
AIMat
LRM
29
0
0
22 Apr 2025
AGI Is Coming... Right After AI Learns to Play Wordle
AGI Is Coming... Right After AI Learns to Play Wordle
Sarath Shekkizhar
Romain Cosentino
LLMAG
45
0
0
21 Apr 2025
Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability
Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability
Jennifer Haase
P. Hanel
Sebastian Pokutta
ALM
LRM
67
0
0
10 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
100
5
0
09 Apr 2025
1