ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19099
102
0

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

25 May 2025
Kun Xiang
Heng Li
Terry Jingchen Zhang
Yinya Huang
Zirong Liu
Peixin Qu
Jixi He
Jiaqi Chen
Yu-Jie Yuan
J. N. Han
Hang Xu
Hanhui Li
Mrinmaya Sachan
Xiaodan Liang
    LRM
ArXivPDFHTML
Abstract

We present SeePhys, a large-scale multimodal benchmark for LLM reasoning grounded in physics questions ranging from middle school to PhD qualifying exams. The benchmark covers 7 fundamental domains spanning the physics discipline, incorporating 21 categories of highly heterogeneous diagrams. In contrast to prior works where visual elements mainly serve auxiliary purposes, our benchmark features a substantial proportion of vision-essential problems (75%) that mandate visual information extraction for correct solutions. Through extensive evaluation, we observe that even the most advanced visual reasoning models (e.g., Gemini-2.5-pro and o4-mini) achieve sub-60% accuracy on our benchmark. These results reveal fundamental challenges in current large language models' visual understanding capabilities, particularly in: (i) establishing rigorous coupling between diagram interpretation and physics reasoning, and (ii) overcoming their persistent reliance on textual cues as cognitive shortcuts.

View on arXiv
@article{xiang2025_2505.19099,
  title={ SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning },
  author={ Kun Xiang and Heng Li and Terry Jingchen Zhang and Yinya Huang and Zirong Liu and Peixin Qu and Jixi He and Jiaqi Chen and Yu-Jie Yuan and Jianhua Han and Hang Xu and Hanhui Li and Mrinmaya Sachan and Xiaodan Liang },
  journal={arXiv preprint arXiv:2505.19099},
  year={ 2025 }
}
Comments on this paper