Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03874
Cited By
Measuring Mathematical Problem Solving With the MATH Dataset
5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Mathematical Problem Solving With the MATH Dataset"
50 / 1,407 papers shown
Title
Adaptive Rectification Sampling for Test-Time Compute Scaling
Zhendong Tan
Xingjun Zhang
Chaoyi Hu
Yancheng Pan
Shaoxun Wang
LRM
31
0
0
02 Apr 2025
YourBench: Easy Custom Evaluation Sets for Everyone
S. Kamath S
Clémentine Fourrier
Alina Lozovskia
Thomas Wolf
Gökhan Tür
Dilek Hakkani-Tür
35
2
0
02 Apr 2025
Hawkeye:Efficient Reasoning with Model Collaboration
Jianshu She
Z. Li
Zhemin Huang
Qi Li
Peiran Xu
Haonan Li
Qirong Ho
LRM
58
1
0
01 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
52
1
0
01 Apr 2025
HERA: Hybrid Edge-cloud Resource Allocation for Cost-Efficient AI Agents
Shiyi Liu
Haiying Shen
Shuai Che
Mahdi Ghandi
Mingqin Li
LLMAG
48
0
0
01 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Z. Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
95
3
0
01 Apr 2025
Z1: Efficient Test-time Scaling with Code
Zhaojian Yu
Yinghao Wu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
37
2
0
01 Apr 2025
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi
Hritik Bansal
Arian Hosseini
Aditya Grover
Kai-Wei Chang
Marcus Rohrbach
Anna Rohrbach
OffRL
LRM
37
0
0
01 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
70
0
0
01 Apr 2025
AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems
Y. Yang
Huacan Chai
Shuai Shao
Y. Song
Siyuan Qi
Renting Rui
Weinan Zhang
AIFin
41
0
0
01 Apr 2025
TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance
Jingxian Xu
Mengyu Zhou
W. Liu
Hanbing Liu
Shi Han
Dongmei Zhang
LRM
45
1
0
31 Mar 2025
Entropy-Based Adaptive Weighting for Self-Training
Xiaoxuan Wang
Yihe Deng
Mingyu Derek Ma
Wei Wang
LRM
47
0
0
31 Mar 2025
Do Large Language Models Exhibit Spontaneous Rational Deception?
Samuel M. Taylor
Benjamin K. Bergen
LRM
45
0
0
31 Mar 2025
DebFlow: Automating Agent Creation via Agent Debate
Jinwei Su
Yinghui Xia
Ronghua Shi
Jianhui Wang
Jianuo Huang
Y. Wang
Tianyu Shi
Yang Jingsong
Lewei He
30
0
0
31 Mar 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
Xuwei Ding
Jieyu Zhang
Taiwei Shi
Ryotaro Shimizu
Rahul Gupta
Y. Liu
Jian Kang
Jieyu Zhao
KELM
58
0
0
30 Mar 2025
Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces
Max Hort
Leon Moonen
39
0
0
30 Mar 2025
ToRL: Scaling Tool-Integrated RL
Xuefeng Li
Haoyang Zou
Pengfei Liu
OffRL
LRM
39
3
0
30 Mar 2025
Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts
Jianhua Sun
Jiude Wei
Y. Li
Cewu Lu
LM&Ro
54
1
0
30 Mar 2025
InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding
Anastasiia Fadeeva
Vincent Coriou
Diego Antognini
C. Musat
Andrii Maksai
47
0
0
29 Mar 2025
Efficient Inference for Large Reasoning Models: A Survey
Y. Liu
Jiaying Wu
Yufei He
Hongcheng Gao
Hongyu Chen
Baolong Bi
Jiaheng Zhang
Zhiqi Huang
Bryan Hooi
LLMAG
LRM
67
7
0
29 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
130
0
0
28 Mar 2025
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
Belinda Z. Li
Been Kim
Z. Wang
LRM
38
2
0
28 Mar 2025
Process Reward Modeling with Entropy-Driven Uncertainty
Lang Cao
Renhong Chen
Yingtian Zou
Chao Peng
Wu Ning
...
Y. Wang
Peishuo Su
Mofan Peng
Zijie Chen
Yitong Li
34
0
0
28 Mar 2025
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
Haoxiang Sun
Yingqian Min
Z. Chen
Wayne Xin Zhao
Zheng Liu
Z. Wang
Lei Fang
Ji-Rong Wen
ELM
LRM
47
2
0
27 Mar 2025
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
56
0
0
27 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
82
14
0
27 Mar 2025
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel El Feghali
Chinmay Singh
Darya Moldavskaya
...
Lucas Page-Caccia
Matheus Pereira
Minseon Kim
Alessandro Sordoni
Marc-Alexandre Côté
LLMAG
68
1
0
27 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
W. Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Y. Zhuang
LM&Ro
LRM
65
3
0
27 Mar 2025
SWI: Speaking with Intent in Large Language Models
Yuwei Yin
EunJeong Hwang
Giuseppe Carenini
LRM
46
0
0
27 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
60
0
0
27 Mar 2025
Boosting Large Language Models with Mask Fine-Tuning
M. Zhang
Yue Bai
Huan Wang
Yizhou Wang
Qihua Dong
Y. Fu
CLL
53
0
0
27 Mar 2025
Entropy-Aware Branching for Improved Mathematical Reasoning
Xianzhi Li
Ethan Callanan
Xiaodan Zhu
Mathieu Sibue
Antony Papadimitriou
Mahmoud Mahfouz
Zhiqiang Ma
Xiaomo Liu
LRM
37
0
0
27 Mar 2025
Qwen2.5-Omni Technical Report
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
...
K. Dang
Bin Zhang
X. Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
86
15
0
26 Mar 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
OffRL
LRM
67
38
0
26 Mar 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
42
1
0
25 Mar 2025
Learning to chain-of-thought with Jensen's evidence lower bound
Yunhao Tang
Sid Wang
Rémi Munos
BDL
OffRL
LRM
50
0
0
25 Mar 2025
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin
Rishab Balasubramanian
Fengyuan Liu
Nikhil Kandpal
Tu Vu
61
0
0
25 Mar 2025
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
87
30
0
25 Mar 2025
Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking
Yuyao Ge
Shenghua Liu
Y. Wang
Lingrui Mei
Lizhe Chen
Baolong Bi
Xueqi Cheng
ReLM
LRM
49
2
0
25 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
J. Hockenmaier
Graham Neubig
Sean Welleck
LRM
50
2
0
25 Mar 2025
LogicLearner: A Tool for the Guided Practice of Propositional Logic Proofs
Amogh Inamdar
U. Macar
Michel Vazirani
Michael Tarnow
Zarina Mustapha
Natalia Dittren
Sam Sadeh
Nakul Verma
Ansaf Salleb-Aouissi
LRM
37
0
0
25 Mar 2025
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer
Ivan Vulić
E. Ponti
145
0
0
25 Mar 2025
1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training
Han Zhao
Haotian Wang
Yiping Peng
Sitong Zhao
Xiaoyu Tian
Shuaiting Chen
Yunjie Ji
Xiangang Li
RALM
ReLM
LRM
73
8
0
25 Mar 2025
Scaling Laws of Synthetic Data for Language Models
Zeyu Qin
Qingxiu Dong
Xingxing Zhang
Li Dong
Xiaolong Huang
...
Hany Awadalla
Yi R. Fung
Weizhu Chen
Minhao Cheng
Furu Wei
SyDa
75
2
0
25 Mar 2025
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
Yunhao Tang
Kunhao Zheng
Gabriel Synnaeve
Rémi Munos
39
1
0
25 Mar 2025
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Haebin Shin
Lei Ji
Xiao Liu
Yeyun Gong
52
0
0
24 Mar 2025
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
J. Li
Jie Zhou
Yutao Yang
Bihao Zhan
Qianjun Pan
Yuyang Ding
Qin Chen
Jiang Bo
Xin Lin
Liang He
LRM
57
0
0
24 Mar 2025
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
59
2
0
24 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
91
31
0
24 Mar 2025
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li
Rushi Qiang
Lama Moukheiber
Chao Zhang
LRM
46
0
0
24 Mar 2025
Previous
1
2
3
4
5
...
27
28
29
Next