ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.17140
  4. Cited By
Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Small Language Models Need Strong Verifiers to Self-Correct Reasoning

26 April 2024
Yunxiang Zhang
Muhammad Khalifa
Lajanugen Logeswaran
Jaekyeom Kim
Moontae Lee
Honglak Lee
Lu Wang
    LRM
    KELM
    ReLM
ArXivPDFHTML

Papers citing "Small Language Models Need Strong Verifiers to Self-Correct Reasoning"

30 / 30 papers shown
Title
Process Reward Models That Think
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
44
1
0
23 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq R. Joty
ELM
ALM
LRM
50
2
0
21 Apr 2025
Efficient Reasoning Models: A Survey
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
142
0
0
15 Apr 2025
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Wang Yang
Xiang Yue
V. Chaudhary
Xiaotian Han
ReLM
LRM
70
1
0
12 Apr 2025
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang
Yulin Chen
Jane Pan
Chen Zhao
Aurojit Panda
Jinyang Li
He He
ReLM
LRM
44
2
0
07 Apr 2025
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Liangjie Huang
Dawei Li
Huan Liu
Lu Cheng
LRM
34
0
0
03 Apr 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
52
0
0
29 Mar 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu-Hu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Yu Wu
Chenlin Ming
H. V. Zhao
Conghui He
Lijun Wu
LRM
59
0
0
21 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
74
26
0
20 Mar 2025
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
Yuchen Yan
Jin Jiang
Yang Liu
Yixin Cao
Xin Xu
M. Zhang
Xunliang Cai
Jian Shao
ReLM
LRM
KELM
112
7
0
21 Feb 2025
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
S2^22R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
57
2
0
18 Feb 2025
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Yunhua Zhou
Xipeng Qiu
LRM
78
9
0
17 Feb 2025
Towards Reasoning Ability of Small Language Models
Towards Reasoning Ability of Small Language Models
Gaurav Srivastava
Shuxiang Cao
Xuan Wang
ReLM
LRM
49
4
0
17 Feb 2025
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen
Guangtao Zeng
Zhenting Qi
Zhang-Wei Hong
Zhenfang Chen
Wei Lu
G. Wornell
Subhro Das
David D. Cox
Chuang Gan
LLMAG
LRM
157
5
0
04 Feb 2025
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
Xiangru Tang
Tianyu Hu
Muyang Ye
Yanjun Shao
Xunjian Yin
...
Pan Lu
Zhuosheng Zhang
Yilun Zhao
Arman Cohan
Mark B. Gerstein
LLMAG
LRM
AI4CE
66
6
0
11 Jan 2025
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search
  Boosted Reasoning via Iterative Preference Learning
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
Huchen Jiang
Yangyang Ma
Chaofan Ding
Kexin Luan
Xinhan Di
ReLM
LRM
36
2
0
23 Dec 2024
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning
  Small Language Models
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Y. Fu
Yin Yu
Xiaotian Han
Runchao Li
Xianxuan Long
Haotian Yu
Pan Li
SyDa
57
0
0
25 Nov 2024
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Akira Kawabata
Saku Sugawara
LRM
31
3
0
07 Oct 2024
Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?
Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?
Nemika Tyagi
Mihir Parmar
Mohith Kulkarni
Aswin Rrv
Nisarg Patel
Mutsumi Nakamura
Arindam Mitra
Chitta Baral
LRM
35
6
0
20 Jul 2024
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes
  in Mathematical Reasoning
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning
Joykirat Singh
A. Nambi
Vibhav Vineet
LRM
37
5
0
16 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of
  Self-Correction of LLMs
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
47
57
0
03 Jun 2024
A Theoretical Understanding of Self-Correction through In-context
  Alignment
A Theoretical Understanding of Self-Correction through In-context Alignment
Yifei Wang
Yuyang Wu
Zeming Wei
Stefanie Jegelka
Yisen Wang
LRM
36
13
0
28 May 2024
Can We Verify Step by Step for Incorrect Answer Detection?
Can We Verify Step by Step for Incorrect Answer Detection?
Xin Xu
Shizhe Diao
Can Yang
Yang Wang
LRM
122
13
0
16 Feb 2024
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not
  Evaluate
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate
Juhyun Oh
Eunsu Kim
Inha Cha
Alice H. Oh
ELM
37
8
0
09 Feb 2024
ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement
ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement
Xinliang Frederick Zhang
Carter Blum
Temma Choji
Shalin S Shah
Alakananda Vempala
59
6
0
24 Jan 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
235
298
0
18 Jan 2024
Self-RAG: Learning to Retrieve, Generate, and Critique through
  Self-Reflection
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai
Zeqiu Wu
Yizhong Wang
Avirup Sil
Hannaneh Hajishirzi
RALM
159
624
0
17 Oct 2023
Cumulative Reasoning with Large Language Models
Cumulative Reasoning with Large Language Models
Yifan Zhang
Jingqin Yang
Yang Yuan
Andrew Chi-Chih Yao
ReLM
ELM
LRM
AI4CE
36
67
0
08 Aug 2023
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
310
4,077
0
24 May 2022
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic
  Creativity and Commonsense Knowledge
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge
Bill Yuchen Lin
Ziyi Wu
Yichi Yang
Dong-Ho Lee
Xiang Ren
ReLM
LRM
236
64
0
02 Jan 2021
1