ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03874
  4. Cited By
Measuring Mathematical Problem Solving With the MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset

5 March 2021
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
    ReLM
    FaML
ArXivPDFHTML

Papers citing "Measuring Mathematical Problem Solving With the MATH Dataset"

50 / 1,406 papers shown
Title
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang
Zhiyang Chen
Zijun Wang
Tiancheng Li
Guo-Jun Qi
DiffM
LRM
AI4CE
18
0
0
15 May 2025
Parallel Scaling Law for Language Models
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
37
0
0
15 May 2025
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
14
0
0
15 May 2025
WorldPM: Scaling Human Preference Modeling
WorldPM: Scaling Human Preference Modeling
B. Wang
Runji Lin
K. Lu
L. Yu
Z. Zhang
...
Xuanjing Huang
Yu Jiang
Bowen Yu
J. Zhou
Junyang Lin
19
0
0
15 May 2025
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Zhiyuan Hu
Y. Wang
Hanze Dong
Yuhui Xu
Amrita Saha
Caiming Xiong
Bryan Hooi
Junnan Li
LRM
14
0
0
15 May 2025
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention
Jintian Shao
Hongyi Huang
Jiayi Wu
Beiwen Zhang
ZhiYu Wu
You Shan
MingKai Zheng
22
0
0
15 May 2025
Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency
Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency
Daniel Weitekamp
Christopher James Maclellan
Erik Harpstead
Kenneth R. Koedinger
14
0
0
15 May 2025
Multi-Token Prediction Needs Registers
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos
Spyros Gidaris
N. Komodakis
24
0
0
15 May 2025
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
Xiwen Chen
Wenhui Zhu
Peijie Qiu
Xuanzhao Dong
Hao Wang
Haiyu Wu
Huayu Li
Aristeidis Sotiras
Y. Wang
Abolfazl Razi
ALM
37
0
0
14 May 2025
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
Enyu Zhao
Vedant Raval
Hejia Zhang
Jiageng Mao
Zeyu Shangguan
S. Nikolaidis
Y. Wang
Daniel Seita
LM&Ro
CoGe
45
0
0
14 May 2025
Qwen3 Technical Report
Qwen3 Technical Report
A. Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Z. Zhang
Z. Zhou
Z. Qiu
LLMAG
OSLM
LRM
40
0
0
14 May 2025
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
Xiaoyang Chen
Xinan Dai
Yu Du
Qian Feng
Naixu Guo
...
J. Xu
Yiyang Yu
Z. Yang
Hongji Zha
Ruichong Zhang
LRM
29
0
0
13 May 2025
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM
Nicholas Attolino
Alessio Capitanelli
Fulvio Mastrogiovanni
34
0
0
13 May 2025
Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection
Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection
Pei-Fu Guo
Yun-Da Tsai
Shou-De Lin
UD
46
0
0
12 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
23
0
0
12 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
B. S.
Cici
Dawei Zhu
...
Y. Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoE
ReLM
LRM
AI4CE
37
0
0
12 May 2025
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
Junjie Ye
Caishuang Huang
Z. Chen
Wenjie Fu
Chenyuan Yang
...
Tao Gui
Qi Zhang
Zhongchao Shi
Jianping Fan
Xuanjing Huang
ALM
39
0
0
12 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
23
0
0
12 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
X. Wu
Weinong Wang
Yingying Zhang
Wenqiang Zhang
ReLM
LRM
34
0
0
12 May 2025
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
Xiaotian Lin
Yanlin Qi
Yizhang Zhu
Themis Palpanas
Chengliang Chai
Nan Tang
Yuyu Luo
21
0
0
12 May 2025
xGen-small Technical Report
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
53
0
0
10 May 2025
AgentXploit: End-to-End Redteaming of Black-Box AI Agents
AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Zhun Wang
Vincent Siu
Zhe Ye
Tianneng Shi
Yuzhou Nie
Xuandong Zhao
Chenguang Wang
Wenbo Guo
Dawn Song
LLMAG
AAML
36
0
0
09 May 2025
Stability in Single-Peaked Strategic Resource Selection Games
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
29
3
0
09 May 2025
CellVerse: Do Large Language Models Really Understand Cell Biology?
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang
Tianyu Liu
Zhihong Zhu
Hao Wu
H. Wang
Donghao Zhou
Yefeng Zheng
Kun Wang
X. Wu
Pheng-Ann Heng
ELM
36
0
0
09 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
30
0
0
08 May 2025
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
Ziqing Qiao
Yongheng Deng
Jiali Zeng
Dong Wang
Lai Wei
Fandong Meng
Jie Zhou
Ju Ren
Yaoxue Zhang
LRM
49
0
0
08 May 2025
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Y. Wang
Z. Fu
Jie Cai
Peijun Tang
Hongya Lyu
...
Jie Zhou
Guoyang Zeng
Chaojun Xiao
Xu Han
Zhiyuan Liu
49
0
0
08 May 2025
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Harry Dong
Bilge Acun
Beidi Chen
Yuejie Chi
LRM
29
0
0
08 May 2025
Scalable Chain of Thoughts via Elastic Reasoning
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu
Hanze Dong
Lei Wang
Doyen Sahoo
Junnan Li
Caiming Xiong
OffRL
LRM
51
1
0
08 May 2025
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Qi Liu
Xinhao Zheng
Renqiu Xia
Xingzhi Qi
Qinxiang Cao
Junchi Yan
AIMat
50
0
0
07 May 2025
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Enbo Zhao
Yi Shen
Shuming Shi
Jieyun Huang
Z. Chen
Ning Wang
Siqi Xiao
J. Zhang
Kai Wang
Shiguo Lian
MQ
44
0
0
05 May 2025
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
S. Song
Ce Zhang
James Y. Zou
ALM
31
0
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Kazuki Fujii
Yukito Tajima
Sakae Mizuki
Hinari Shimada
Taihei Shiotani
...
Kakeru Hattori
Youmi Ma
Hiroya Takamura
Rio Yokota
Naoaki Okazaki
SyDa
49
0
0
05 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
50
0
0
05 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
59
0
0
05 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Yike Guo
ELM
125
0
0
04 May 2025
Accelerating Large Language Model Reasoning via Speculative Search
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
59
0
0
03 May 2025
TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students
TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students
Daniel Weitekamp
M. N. Siddiqui
Christopher James Maclellan
LLMAG
ELM
32
0
0
02 May 2025
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Z. Chen
Bo Li
Haifeng Xu
112
0
0
02 May 2025
Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation
Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation
Madhav Kotecha
Vijendra Kumar Vaishya
Smita Gautam
Suraj Racha
32
0
0
02 May 2025
DeepCritic: Deliberate Critique with Large Language Models
DeepCritic: Deliberate Critique with Large Language Models
Wenkai Yang
Jingwen Chen
Yankai Lin
Ji-Rong Wen
ALM
LRM
30
0
0
01 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
45
0
0
01 May 2025
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai
Zhiqiang Xie
Kayvon Fatahalian
LLMAG
75
0
0
01 May 2025
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
Daria Gitman
Igor Gitman
Evelina Bakhturina
SyDa
49
0
0
01 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
69
0
0
01 May 2025
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
H. Luo
Haiying He
Y. Wang
Jinluan Yang
Rui Liu
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
Li Shen
LRM
26
0
0
30 Apr 2025
AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models
AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models
Yinghui He
A. Panigrahi
Yong Lin
Sanjeev Arora
38
0
0
30 Apr 2025
Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges
Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges
Xiao Xiao
Yu Su
Sijing Zhang
Zhang Chen
Yadong Chen
Tian Liu
32
0
0
30 Apr 2025
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Jinyan Su
Jennifer Healey
Preslav Nakov
Claire Cardie
LRM
141
0
0
30 Apr 2025
1234...272829
Next