ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03874
  4. Cited By
Measuring Mathematical Problem Solving With the MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
    ReLM
    FaML
ArXivPDFHTML

Papers citing "Measuring Mathematical Problem Solving With the MATH Dataset"

50 / 1,407 papers shown
Title
Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination
Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination
Mika Setälä
Pieta Sikström
Ville Heilala
T. Karkkainen
ELM
LRM
28
1
0
15 Apr 2025
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Wei Xiong
Jiarui Yao
Yuhui Xu
Bo Pang
Lei Wang
...
Junnan Li
Nan Jiang
Tong Zhang
Caiming Xiong
Hanze Dong
OffRL
LRM
40
3
0
15 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
M. Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
146
0
1
15 Apr 2025
Efficient Reasoning Models: A Survey
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
145
0
0
15 Apr 2025
Weight Ensembling Improves Reasoning in Language Models
Weight Ensembling Improves Reasoning in Language Models
Xingyu Dang
Christina Baek
Kaiyue Wen
Zico Kolter
Aditi Raghunathan
MoMe
LRM
63
1
0
14 Apr 2025
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Zaid Khan
Elias Stengel-Eskin
Archiki Prasad
Jaemin Cho
Mohit Bansal
29
0
0
14 Apr 2025
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Ding Chen
Qingchen Yu
P. Wang
W. Zhang
Bo Tang
Feiyu Xiong
X. Li
Minchuan Yang
Z. Li
ALM
LRM
36
2
0
14 Apr 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
28
0
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
68
7
1
14 Apr 2025
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability
Y. Zhang
Zihao Zeng
Dongbai Li
Yao Huang
Zhijie Deng
Yinpeng Dong
LRM
27
4
0
14 Apr 2025
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang
Wen-Ding Li
Daniele Paliotta
Daniel Ritter
Alexander M. Rush
Tri Dao
LRM
31
0
0
14 Apr 2025
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
Aryan Shrivastava
Paula Akemi Aoyagui
29
0
0
14 Apr 2025
Efficient Process Reward Model Training via Active Learning
Efficient Process Reward Model Training via Active Learning
Keyu Duan
Zichen Liu
Xin Mao
Tianyu Pang
Changyu Chen
Qiguang Chen
Michael Shieh
Longxu Dou
LRM
25
1
0
14 Apr 2025
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
146
2
0
14 Apr 2025
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
Yutao Mou
Yuxiao Luo
Shikun Zhang
Wei Ye
LLMSV
LRM
36
0
0
13 Apr 2025
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Siddhant Gupta
Drishti Sharma
Jebish Purbey
Kanwal Mehreen
Muhammad Arham
Hamza Farooq
30
0
0
13 Apr 2025
Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance
Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance
Zuoli Tang
Junjie Ou
Kaiqin Hu
Chunwei Wu
Zhaoxin Huan
Chilin Fu
Xiaolu Zhang
Jun Zhou
Chenliang Li
ReLM
LRM
40
0
0
13 Apr 2025
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Chenghao Li
Chaoning Zhang
Yi Lu
J. Zhang
Qigan Sun
X. Wang
Jiwei Wei
Guoqing Wang
Yang Yang
H. Shen
LRM
60
1
0
13 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELM
LM&MA
55
0
0
13 Apr 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries
Neil He
Jiahong Liu
Buze Zhang
N. Bui
Ali Maatouk
Menglin Yang
Irwin King
Melanie Weber
Rex Ying
29
0
0
11 Apr 2025
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
FangZhi Xu
Hang Yan
Chang Ma
Haiteng Zhao
Qiushi Sun
Kanzhi Cheng
Junxian He
Jun Liu
Zhiyong Wu
LRM
29
1
0
11 Apr 2025
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
Xin Gao
Qizhi Pei
Zinan Tang
Y. Li
Honglin Lin
Jiang Wu
C. He
Lijun Wu
SyDa
28
0
0
11 Apr 2025
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Yixin Cao
Jiahao Ying
Y. Wang
Xipeng Qiu
Xuanjing Huang
Yugang Jiang
ELM
36
2
0
10 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
C. Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
103
2
0
10 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
C. Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Wenhu Chen
OffRL
ReLM
SyDa
LRM
VLM
72
1
0
10 Apr 2025
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models
Ling Team
Caizhi Tang
Chilin Fu
Chunwei Wu
Jia Guo
...
Shuaicheng Li
Y. Zhang
Yingting Wu
Y. Liu
Zhenyu Huang
LRM
19
0
0
09 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
97
4
0
09 Apr 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan
Ming Li
Lichao Sun
Tianyi Zhou
LRM
51
3
0
09 Apr 2025
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
C. Xu
Wei Ping
P. Xu
Z. Liu
Boxin Wang
M. Shoeybi
Bo Li
Bryan Catanzaro
17
1
0
08 Apr 2025
SEA-LION: Southeast Asian Languages in One Network
SEA-LION: Southeast Asian Languages in One Network
Raymond Ng
Thanh Ngan Nguyen
Yuli Huang
Ngee Chia Tai
Wai Yi Leong
...
David Ong Tat-Wee
B. Liu
William-Chandra Tjhi
Erik Cambria
Leslie Teo
36
11
0
08 Apr 2025
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
N. Mudur
Hao Cui
Subhashini Venugopalan
Paul Raccuglia
M. Brenner
Peter C. Norgaard
LLMAG
ELM
LRM
40
0
0
08 Apr 2025
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang
Yulin Chen
Jane Pan
Chen Zhao
Aurojit Panda
Jinyang Li
He He
ReLM
LRM
44
2
0
07 Apr 2025
A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization
A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization
Wenyuan Xu
Xiaochen Zuo
Chao Xin
Yu Yue
Lin Yan
Yonghui Wu
OffRL
21
1
0
07 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Tianyi Zhou
Jieyu Zhao
LRM
78
1
0
07 Apr 2025
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
Runjin Chen
Zhenyu (Allen) Zhang
Junyuan Hong
Souvik Kundu
Zhangyang Wang
OffRL
LRM
47
2
0
07 Apr 2025
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
Will Cai
Tianneng Shi
Xuandong Zhao
Dawn Song
28
0
0
07 Apr 2025
Concise Reasoning via Reinforcement Learning
Concise Reasoning via Reinforcement Learning
Mehdi Fatemi
Banafsheh Rafiee
Mingjie Tang
Kartik Talamadupula
ReLM
OffRL
LRM
52
4
0
07 Apr 2025
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
Minki Kang
Jongwon Jeong
Jaewoong Cho
ALM
LRM
41
2
0
07 Apr 2025
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
Fan Nie
Lan Feng
Haotian Ye
Weixin Liang
Pan Lu
Huaxiu Yao
Alexandre Alahi
James Zou
78
0
0
07 Apr 2025
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation
J. S. Park
J. Park
Dongju Jang
Jiwan Chung
Byungwoo Yoo
Jaewoo Shin
S. Park
Taehyeong Kim
Youngjae Yu
41
0
0
04 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
29
0
0
04 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
42
0
0
04 Apr 2025
MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance
MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance
Chen Hu
Timothy Neate
Shan Luo
Letizia Gionfrida
49
3
0
04 Apr 2025
Universal Collection of Euclidean Invariants between Pairs of Position-Orientations
Universal Collection of Euclidean Invariants between Pairs of Position-Orientations
Gijs Bellaard
B. Smets
R. Duits
59
0
0
04 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
R. Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Janet Liu
Y. Wu
OffRL
LRM
46
10
0
03 Apr 2025
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models
H. Le
Dai Do
D. Nguyen
Svetha Venkatesh
OffRL
LRM
36
0
0
03 Apr 2025
MegaMath: Pushing the Limits of Open Math Corpora
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou
Zengzhi Wang
Nikhil Ranjan
Zhoujun Cheng
Liping Tang
Guowei He
Zhengzhong Liu
Eric P. Xing
LRM
51
1
0
03 Apr 2025
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?
Andres Algaba
Vincent Holst
Floriano Tori
Melika Mobini
Brecht Verbeken
Sylvia Wenmackers
Vincent Ginis
38
1
0
03 Apr 2025
Scaling Test-time Compute for Low-resource Languages: Multilingual Reasoning in LLMs
Scaling Test-time Compute for Low-resource Languages: Multilingual Reasoning in LLMs
Khanh-Tung Tran
Barry O’Sullivan
Hoang D. Nguyen
LRM
32
0
0
02 Apr 2025
YourBench: Easy Custom Evaluation Sets for Everyone
YourBench: Easy Custom Evaluation Sets for Everyone
S. Kamath S
Clémentine Fourrier
Alina Lozovskia
Thomas Wolf
Gökhan Tür
Dilek Hakkani-Tür
35
2
0
02 Apr 2025
Previous
123456...272829
Next