ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.12054
  4. Cited By
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning

PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning

17 February 2025
Xinyu Zhang
Yuxuan Dong
Yongpeng Wu
Jiaxing Huang
Chengyou Jia
Basura Fernando
Mike Zheng Shou
Lingling Zhang
Jun Liu
    AIMat
    ReLM
    LRM
ArXivPDFHTML

Papers citing "PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning"

30 / 30 papers shown
Title
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
Kun Xiang
Heng Li
Terry Jingchen Zhang
Yinya Huang
Zirong Liu
...
J. N. Han
Hang Xu
Hanhui Li
Mrinmaya Sachan
Xiaodan Liang
LRM
102
0
0
25 May 2025
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution
Philipp D. Siedler
42
0
0
21 May 2025
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
Song Dai
Yibo Yan
Jiamin Su
Dongfang Zihao
Yubo Gao
...
Jungang Li
Junyan Zhang
Sicheng Tao
Zhuoran Gao
Xuming Hu
LRM
AI4CE
49
0
0
21 May 2025
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILaw
ELM
88
0
0
19 May 2025
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
Yunzhuo Hao
Jiawei Gu
Huichen Will Wang
Linjie Li
Zhiyong Yang
Lijuan Wang
Yu Cheng
LRM
80
32
0
10 Jan 2025
ProcessBench: Identifying Process Errors in Mathematical Reasoning
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Chujie Zheng
Zizhuo Zhang
Beichen Zhang
Runji Lin
Keming Lu
Bowen Yu
Dayiheng Liu
Jingren Zhou
Junyang Lin
LRM
157
71
0
09 Dec 2024
MinerU: An Open-Source Solution for Precise Document Content Extraction
MinerU: An Open-Source Solution for Precise Document Content Extraction
Bin Wang
Chao Xu
Xiaomeng Zhao
Linke Ouyang
Fan Wu
...
Wei Li
Botian Shi
Yu Qiao
Dahua Lin
Conghui He
42
40
0
27 Sep 2024
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via
  Self-Improvement
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang
Beichen Zhang
Binyuan Hui
Bofei Gao
Bowen Yu
...
Mingfeng Xue
Runji Lin
Tianyu Liu
Xingzhang Ren
Zhenru Zhang
OSLM
LRM
74
276
0
18 Sep 2024
Controllable Text Generation for Large Language Models: A Survey
Controllable Text Generation for Large Language Models: A Survey
Xun Liang
Hanyu Wang
Yezhaohui Wang
Shichao Song
Jiawei Yang
...
Jie Hu
Dan Liu
Shunyu Yao
Feiyu Xiong
Zhiyu Li
35
20
0
22 Aug 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than
  Scaling Model Parameters
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
143
626
0
06 Aug 2024
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
107
191
0
22 Apr 2024
Vision-Language Model-based Physical Reasoning for Robot Liquid
  Perception
Vision-Language Model-based Physical Reasoning for Robot Liquid Perception
Wenqiang Lai
Yuan Gao
T. Lam
LRM
LM&Ro
105
7
0
10 Apr 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELM
AIMat
84
243
0
21 Feb 2024
Applications of Large Scale Foundation Models for Autonomous Driving
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELM
AI4CE
LRM
ALM
LM&Ro
78
15
0
20 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
74
655
0
20 Nov 2023
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large
  Language Models
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Fangzhi Xu
Zhiyong Wu
Qiushi Sun
Siyu Ren
Fei Yuan
Shuai Yuan
Qika Lin
Yu Qiao
Jun Liu
LLMAG
58
36
0
15 Nov 2023
DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to
  Determinacy
DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy
Hongda Sun
Weikai Xu
Wei Liu
Jian Luan
Bin Wang
Shuo Shang
Ji-Rong Wen
Rui Yan
LRM
96
26
0
28 Oct 2023
Physically Grounded Vision-Language Models for Robotic Manipulation
Physically Grounded Vision-Language Models for Robotic Manipulation
Jensen Gao
Bidipta Sarkar
F. Xia
Ted Xiao
Jiajun Wu
Brian Ichter
Anirudha Majumdar
Dorsa Sadigh
LM&Ro
65
128
0
05 Sep 2023
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for
  Scientific Research
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun
Yang Han
Zihan Zhao
Da Ma
Zhe-Wei Shen
Baocai Chen
Lu Chen
Kai Yu
ELM
63
82
0
25 Aug 2023
Forward-Backward Reasoning in Large Language Models for Mathematical
  Verification
Forward-Backward Reasoning in Large Language Models for Mathematical Verification
Weisen Jiang
Han Shi
L. Yu
Zheng Liu
Yu Zhang
Zhenguo Li
James T. Kwok
LRM
77
28
0
15 Aug 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
39
105
0
20 Jul 2023
Are Large Language Models Really Good Logical Reasoners? A Comprehensive
  Evaluation and Beyond
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond
Fangzhi Xu
Qika Lin
Jiawei Han
Tianzhe Zhao
Jun Liu
Min Zhang
ELM
LRM
96
36
0
16 Jun 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
139
1,122
0
31 May 2023
Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For
  Large Language Models
Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models
Daman Arora
H. Singh
Mausam
ELM
LRM
82
54
0
24 May 2023
TheoremQA: A Theorem-driven Question Answering dataset
TheoremQA: A Theorem-driven Question Answering dataset
Wenhu Chen
Ming Yin
Max Ku
Pan Lu
Yixin Wan
Xueguang Ma
Jianyu Xu
Xinyi Wang
Tony Xia
AIMat
75
134
0
21 May 2023
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for
  Foundation Models
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Yuzhen Huang
Yuzhuo Bai
Zhihao Zhu
Junlei Zhang
Jinghan Zhang
...
Yikai Zhang
Jiayi Lei
Yao Fu
Maosong Sun
Junxian He
ELM
LRM
62
532
0
15 May 2023
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai Lu
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
ALM
ELM
68
528
0
13 Apr 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
250
1,230
0
20 Sep 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
225
4,354
0
27 Oct 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
157
4,377
0
07 Sep 2020
1