Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03874
Cited By
Measuring Mathematical Problem Solving With the MATH Dataset
5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Mathematical Problem Solving With the MATH Dataset"
50 / 1,407 papers shown
Title
Long Is More Important Than Difficult for Training Reasoning Models
Si Shen
Fei Huang
Zhixiao Zhao
C. Liu
Tiansheng Zheng
Danhao Zhu
AIMat
RALM
LRM
57
0
0
23 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
42
0
0
22 Mar 2025
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Codefuse
Ling Team
Wenting Cai
Yuchen Cao
C. Chen
...
Wei Zhang
Z. Zhang
Hailin Zhao
Xunjin Zheng
Jun Zhou
ALM
MoE
49
0
0
22 Mar 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu-Hu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Yu Wu
Chenlin Ming
H. V. Zhao
Conghui He
Lijun Wu
LRM
59
0
0
21 Mar 2025
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models
Mingyang Song
Mao Zheng
Zheng Li
Wenjie Yang
Xuan Luo
Yue Pan
Feng Zhang
ReLM
LRM
80
4
0
21 Mar 2025
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Yijia Luo
Yulin Song
Xingyao Zhang
Jiaheng Liu
Weixun Wang
Gengru Chen
Wenbo Su
Bo Zheng
LRM
58
4
0
20 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
93
7
0
20 Mar 2025
DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs
Masoud Hashemi
Oluwanifemi Bamgbose
Sathwik Tejaswi Madhusudhan
Jishnu Sethumadhavan Nair
Aman Tiwari
Vikas Yadav
ReLM
ELM
LRM
72
2
0
20 Mar 2025
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Quy-Anh Dang
Chris Ngo
OffRL
LRM
52
9
0
20 Mar 2025
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu
Yan Zheng
Rong Cheng
Qiyu Wu
Wei Guo
...
Hebin Liang
Yifu Yuan
Hangyu Mao
Fuzheng Zhang
Jianye Hao
LRM
AI4CE
59
1
0
20 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
74
26
0
20 Mar 2025
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Chen Li
Nazhou Liu
Kai Yang
38
3
0
20 Mar 2025
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer
Honglin Lin
Zhuoshi Pan
Yu-Hu Li
Qizhi Pei
Xin Gao
Mengzhang Cai
Conghui He
Lijun Wu
OffRL
LRM
57
0
0
19 Mar 2025
MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models
J. Li
Lu Yu
Qing Cui
Zhiqiang Zhang
Jun Zhou
Yanfang Ye
Chuxu Zhang
59
0
0
19 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
44
1
0
19 Mar 2025
COPA: Comparing the Incomparable to Explore the Pareto Front
Adrián Javaloy
Antonio Vergari
Isabel Valera
62
0
0
18 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
60
0
0
18 Mar 2025
The KoLMogorov Test: Compression by Code Generation
Ori Yoran
Kunhao Zheng
Fabian Gloeckle
Jonas Gehring
Gabriel Synnaeve
Taco Cohen
62
1
0
18 Mar 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Xinyu Fang
Z. Chen
Kai Lan
Lixin Ma
Shengyuan Ding
...
Zicheng Zhang
Guofeng Zhang
Haodong Duan
K. Chen
D. Lin
MLLM
63
1
0
18 Mar 2025
Command R7B Arabic: A Small, Enterprise Focused, Multilingual, and Culturally Aware Arabic LLM
Yazeed Alnumay
Alexandre Barbet
Anna Bialas
William Darling
Shaan Desai
...
Stephanie Howe
Olivia Lasche
Justin Lee
Anirudh Shrinivason
Jennifer Tracey
89
0
0
18 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
89
2
0
18 Mar 2025
Measuring In-Context Computation Complexity via Hidden State Prediction
Vincent Herrmann
Róbert Csordás
Jürgen Schmidhuber
41
0
0
17 Mar 2025
ϕ
ϕ
ϕ
-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
Fangzhi Xu
Hang Yan
Chang Ma
Haiteng Zhao
Jun Liu
Qika Lin
Zhiyong Wu
47
2
0
17 Mar 2025
DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective
Dengyun Peng
Yuhang Zhou
Qiguang Chen
Jinhao Liu
Jingjing Chen
L. Qin
56
0
0
17 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
52
2
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
88
2
0
17 Mar 2025
Pensez: Less Data, Better Reasoning -- Rethinking French LLM
Huy Hoang Ha
ReLM
LRM
68
1
0
17 Mar 2025
RaSA: Rank-Sharing Low-Rank Adaptation
Zhiwei He
Zhaopeng Tu
Xing Wang
Xingyu Chen
Z. Wang
Jiahao Xu
Tian Liang
Wenxiang Jiao
Z. Zhang
Rui Wang
ALM
85
1
0
16 Mar 2025
A Survey on the Optimization of Large Language Model-based Agents
Shangheng Du
Jiabao Zhao
Jinxin Shi
Zhentao Xie
Xin Jiang
Yanhong Bai
Liang He
LLMAG
LM&Ro
LM&MA
206
0
0
16 Mar 2025
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs
Tsz Chung Cheng
Chung Shing Cheng
Chaak Ming Lau
Eugene Tin-Ho Lam
Chun Yat Wong
Hoi On Yu
Cheuk Hei Chong
ELM
59
1
0
16 Mar 2025
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity
Jing Bi
Junjia Guo
Susan Liang
Guangyu Sun
Luchuan Song
...
Jinxi He
Jiarui Wu
A. Vosoughi
C. L. P. Chen
Chenliang Xu
LRM
74
1
0
14 Mar 2025
GNNs as Predictors of Agentic Workflow Performances
Y. Zhang
Yuchen Hou
Bohan Tang
Shuo Chen
Muhan Zhang
Xiaowen Dong
S. Chen
LLMAG
AI4CE
60
0
0
14 Mar 2025
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
LRM
41
0
0
14 Mar 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang
Xiaoxuan He
Hongkun Pan
Xiyan Jiang
Yan Deng
...
Dacheng Yin
Fengyun Rao
Minfeng Zhu
Bo Zhang
Wei Chen
VLM
LRM
54
23
1
13 Mar 2025
OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model
Bowen Zhang
Pengcheng Luo
LRM
AI4CE
LLMAG
75
1
0
13 Mar 2025
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
S. M. I. Simon X. Yang
C. Wang
Yidong Wang
Xiaotao Gu
Minlie Huang
J. Tang
LRM
LLMAG
61
0
0
13 Mar 2025
Numerical Error Analysis of Large Language Models
Stanislav Budzinskiy
Wenyi Fang
Longbin Zeng
Philipp Petersen
47
1
0
13 Mar 2025
Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models
Afrar Jahin
Arif Hassan Zidan
Yu Bao
Shizhe Liang
T. Liu
W. Zhang
LRM
70
1
0
13 Mar 2025
"Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding
Hyunbin Jin
Je Won Yeom
Seunghyun Bae
Taesup Kim
LRM
ReLM
42
1
0
13 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
60
10
0
13 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Ziyu Wan
Yunxiang Li
Y. Song
Hanjing Wang
Linyi Yang
Mark W. Schmidt
J. Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAG
KELM
LRM
AI4CE
86
6
0
12 Mar 2025
Reinforcement Learning is all You Need
Yongsheng Lian
ReLM
OffRL
LRM
70
0
0
12 Mar 2025
MindGYM: Enhancing Vision-Language Models via Synthetic Self-Challenging Questions
Zhe Xu
Daoyuan Chen
Zhenqing Ling
Yaliang Li
Ying Shen
ReLM
SyDa
LRM
55
0
0
12 Mar 2025
DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process
Minjun Zhu
Yixuan Weng
Linyi Yang
Yue Zhang
ALM
LRM
63
2
0
11 Mar 2025
Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors
Runxi Cheng
Feng Xiong
Yongxian Wei
Wanyun Zhu
Chun Yuan
MoMe
68
0
0
11 Mar 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
LRM
ReLM
62
6
0
11 Mar 2025
RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware
Gonzalo Santamaría Gómez
Guillem García Subies
Pablo Gutiérrez Ruiz
Mario González Valero
Natàlia Fuertes
...
Nuria Aldama García
David Betancur Sánchez
Kateryna Sushkova
Marta Guerrero Nieto
Á. Jiménez
51
0
0
11 Mar 2025
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness
Ce Guo
Tong Zhao
61
1
0
11 Mar 2025
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Zhiyuan Zeng
Yizhong Wang
Hannaneh Hajishirzi
Pang Wei Koh
ELM
53
4
0
11 Mar 2025
Dynamic Path Navigation for Motion Agents with LLM Reasoning
Yubo Zhao
Qi Wu
Yifan Wang
Yu-Wing Tai
Chi-Keung Tang
LRM
LLMAG
161
0
0
10 Mar 2025
Previous
1
2
3
4
5
6
...
27
28
29
Next