ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03874
  4. Cited By
Measuring Mathematical Problem Solving With the MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
    ReLM
    FaML
ArXivPDFHTML

Papers citing "Measuring Mathematical Problem Solving With the MATH Dataset"

50 / 1,407 papers shown
Title
Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges
Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges
Xiao Xiao
Yu Su
Sijing Zhang
Zhang Chen
Yadong Chen
Tian Liu
32
0
0
30 Apr 2025
Turing Machine Evaluation for Large Language Model
Turing Machine Evaluation for Large Language Model
Haitao Wu
Zongbo Han
Huaxi Huang
Changqing Zhang
ELM
LRM
62
0
0
29 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
118
2
0
29 Apr 2025
RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library
RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library
J. Wang
Jinhao Jiang
Zhiqiang Zhang
Jun Zhou
Wayne Xin Zhao
SyDa
53
0
0
29 Apr 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
LLMAG
KELM
OffRL
LRM
132
0
0
28 Apr 2025
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
J. Zhang
Flood Sung
Z. Yang
Yang Gao
Chongjie Zhang
LLMAG
40
0
0
28 Apr 2025
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Adam Younsi
Abdalgader Abubaker
M. Seddik
Hakim Hacid
Salem Lahlou
LRM
57
0
0
28 Apr 2025
Security Steerability is All You Need
Security Steerability is All You Need
Itay Hazan
Idan Habler
Ron Bitton
Itsik Mantin
AAML
80
0
0
28 Apr 2025
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Takuya Tamura
Taro Yano
Masafumi Enomoto
M. Oyamada
39
0
0
28 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
X. Li
Kwan-Yee Kenneth Wong
LLMAG
ReLM
LRM
86
0
0
27 Apr 2025
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
Y. Li
Qizhi Pei
Mengyuan Sun
Honglin Lin
Chenlin Ming
Xin Gao
Jiang Wu
C. He
Lijun Wu
ELM
LRM
40
0
0
27 Apr 2025
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang
J. Li
Lijun Wu
M. Zhang
LLMAG
LRM
64
1
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
1
0
26 Apr 2025
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Y. Wang
Pei Zhang
Jialong Tang
H. Wei
Baosong Yang
...
Y. Zhang
Fei Huang
Junyang Lin
Fei Huang
Jingren Zhou
LRM
52
0
0
25 Apr 2025
Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics
Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics
Zena Al-Khalili
Nick Howell
Dietrich Klakow
LRM
29
0
0
24 Apr 2025
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Zhikai Wang
Jiashuo Sun
W. Zhang
Zhiqiang Hu
Xin Li
F. Wang
Deli Zhao
VLM
LRM
75
0
0
24 Apr 2025
MAGIC: Near-Optimal Data Attribution for Deep Learning
MAGIC: Near-Optimal Data Attribution for Deep Learning
Andrew Ilyas
Logan Engstrom
TDI
39
0
0
23 Apr 2025
Process Reward Models That Think
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
44
1
0
23 Apr 2025
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Bartosz Piotrowski
Witold Drzewakowski
Konrad Staniszewski
Piotr Miłoś
LRM
36
0
0
23 Apr 2025
Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Case Study
Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Case Study
Mohammad Khodadad
Ali Shiraee Kasmaee
Mahdi Astaraki
Nicholas Sherck
H. Mahyar
Soheila Samiee
LRM
128
0
0
23 Apr 2025
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Mirian Hipolito Garcia
Camille Couturier
Daniel Madrigal Diaz
Ankur Mallick
Anastasios Kyrillidis
Robert Sim
Victor Rühle
Saravan Rajmohan
30
0
0
23 Apr 2025
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
80
0
0
23 Apr 2025
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
Ivan Moshkov
Darragh Hanley
Ivan Sorokin
Shubham Toshniwal
Christof Henkel
Benedikt D. Schifferer
Wei Du
Igor Gitman
ReLM
LRM
40
1
0
23 Apr 2025
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models
Jie Zhu
Qian Chen
Huaixia Dou
Junhui Li
Lifan Guo
Feng-Xiang Chen
C. Zhang
LRM
29
0
0
22 Apr 2025
TTRL: Test-Time Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Shang Qu
Li Sheng
Xuekai Zhu
Biqing Qi
Youbang Sun
Ganqu Cui
Ning Ding
Bowen Zhou
OffRL
132
1
0
22 Apr 2025
Tina: Tiny Reasoning Models via LoRA
Tina: Tiny Reasoning Models via LoRA
Shangshang Wang
Julian Asilis
Ömer Faruk Akgül
Enes Burak Bilgin
Ollie Liu
W. Neiswanger
OffRL
LRM
35
1
0
22 Apr 2025
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Yuxin Jiang
Y. Wang
Chuhan Wu
Xinyi Dai
Yan Xu
...
Y. Wang
Xin Jiang
Lifeng Shang
R. Tang
W. Wang
29
0
0
22 Apr 2025
Dynamic Early Exit in Reasoning Models
Dynamic Early Exit in Reasoning Models
Chenxu Yang
Qingyi Si
Yongjie Duan
Zheliang Zhu
Chenyu Zhu
Zheng-Shen Lin
Li Cao
Weiping Wang
ReLM
LRM
32
0
0
22 Apr 2025
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
41
0
0
21 Apr 2025
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
Yao Shi
Rongkeng Liang
Yong Xu
LLMAG
AI4Ed
ELM
62
0
0
21 Apr 2025
Trillion 7B Technical Report
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
107
0
0
21 Apr 2025
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
49
2
0
21 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
J. Wang
W. Zhang
OffRL
31
0
0
21 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
J. Z. Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
145
0
0
21 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq R. Joty
ELM
ALM
LRM
50
2
0
21 Apr 2025
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Shihan Dou
Muling Wu
Jingwen Xu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
LRM
27
0
0
19 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
46
11
0
18 Apr 2025
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Junlin Wang
Shang Zhu
Jon Saad-Falcon
Ben Athiwaratkun
Qingyang Wu
Jue Wang
S. Song
Ce Zhang
Bhuwan Dhingra
James Y. Zou
LRM
43
1
0
18 Apr 2025
D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Model
D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Model
Grace Byun
Jinho D. Choi
EGVM
40
0
0
18 Apr 2025
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
J. T. Wang
Jin Jiang
Yang Liu
M. Zhang
Xunliang Cai
LRM
34
0
0
18 Apr 2025
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Xiao Pu
Michael Stephen Saxon
Wenyue Hua
William Yang Wang
LRM
28
0
0
17 Apr 2025
Antidistillation Sampling
Antidistillation Sampling
Yash Savani
Asher Trockman
Zhili Feng
Avi Schwarzschild
Alexander Robey
Marc Finzi
J. Zico Kolter
44
0
0
17 Apr 2025
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Haidar Khan
H. A. Alyahya
Yazeed Alnumay
M Saiful Bari
B. Yener
ELM
LRM
57
0
0
17 Apr 2025
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
Liangyu Xu
Yingxiu Zhao
J. Wang
Yingyao Wang
Bu Pi
...
Jihao Gu
X. Li
Xiaoyong Zhu
Jun Song
Bo Zheng
LRM
156
1
0
17 Apr 2025
ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs
ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs
Yan Yang
Yixia Li
Hongru Wang
Xuetao Wei
Jianqiao Yu
Yun-Nung Chen
Guanhua Chen
MoMe
28
0
0
17 Apr 2025
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Mehmet Hamza Erol
Batu El
Mirac Suzgun
Mert Yuksekgonul
J. Zou
ELM
35
0
0
17 Apr 2025
FLIP Reasoning Challenge
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
VLM
LRM
72
0
0
16 Apr 2025
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
Yiyou Sun
Georgia Zhou
H. Wang
D. Li
Nouha Dziri
Dawn Song
ReLM
ALM
ELM
LRM
72
0
1
16 Apr 2025
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation
Shizhan Cai
Liang Ding
Dacheng Tao
WaLM
52
0
0
16 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffM
LRM
AI4CE
42
1
0
16 Apr 2025
Previous
12345...272829
Next