ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22660
  4. Cited By
Maximizing Confidence Alone Improves Reasoning
v1v2 (latest)

Maximizing Confidence Alone Improves Reasoning

28 May 2025
Mihir Prabhudesai
Lili Chen
Alex Ippoliti
Katerina Fragkiadaki
Hao Liu
Deepak Pathak
    OODOffRLReLMLRM
ArXiv (abs)PDFHTML

Papers citing "Maximizing Confidence Alone Improves Reasoning"

47 / 47 papers shown
Title
Reasoning Models Better Express Their Confidence
Reasoning Models Better Express Their Confidence
Dongkeun Yoon
Seungone Kim
Sohee Yang
Sunkyoung Kim
Soyeon Kim
Yongil Kim
Eunbi Choi
Yireun Kim
Minjoon Seo
LRM
35
3
0
20 May 2025
TTRL: Test-Time Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
390
31
0
22 Apr 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
373
1,967
0
22 Jan 2025
On Verbalized Confidence Scores for LLMs
On Verbalized Confidence Scores for LLMs
Daniel Yang
Yao-Hung Hubert Tsai
M. Yamada
87
10
0
19 Dec 2024
A Survey of Calibration Process for Black-Box LLMs
A Survey of Calibration Process for Black-Box LLMs
Liangru Xie
Hui Liu
Jingying Zeng
Xianfeng Tang
Yan Han
Chen Luo
Jing Huang
Zhen Li
Suhang Wang
Qi He
125
3
0
17 Dec 2024
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective
  Rationales
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Tianyang Xu
Shujin Wu
Shizhe Diao
Xiaoze Liu
Xingyao Wang
Yangyi Chen
Jing Gao
LRM
79
43
0
31 May 2024
Calibration of Large Language Models on Code Summarization
Calibration of Large Language Models on Code Summarization
Yuvraj Virk
Prem Devanbu
Toufique Ahmed
89
11
0
30 Apr 2024
Language Model Cascades: Token-level uncertainty and beyond
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
126
51
0
15 Apr 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
115
273
0
21 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
138
1,238
0
05 Feb 2024
Calibration and Correctness of Language Models for Code
Calibration and Correctness of Language Models for Code
Claudio Spiess
David Gros
Kunal Suresh Pai
Michael Pradel
Md Rafiqul Islam Rabin
Amin Alipour
Susmit Jha
Prem Devanbu
Toufique Ahmed
91
26
0
03 Feb 2024
Towards Uncertainty-Aware Language Agent
Towards Uncertainty-Aware Language Agent
Paul Burgess
Wray Buntine
Ehsan Shareghi
LLMAGAI4CE
85
6
0
25 Jan 2024
Beyond Human Data: Scaling Self-Training for Problem-Solving with
  Language Models
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Avi Singh
John D. Co-Reyes
Rishabh Agarwal
Ankesh Anand
Piyush Patil
...
Yamini Bansal
Ethan Dyer
Behnam Neyshabur
Jascha Narain Sohl-Dickstein
Noah Fiedel
ALMLRMReLMSyDa
213
189
0
11 Dec 2023
Calibrated Language Models Must Hallucinate
Calibrated Language Models Must Hallucinate
Adam Tauman Kalai
Santosh Vempala
HILM
63
84
0
24 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MHELM
107
682
0
20 Nov 2023
Decomposing Uncertainty for Large Language Models through Input
  Clarification Ensembling
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling
Bairu Hou
Yujian Liu
Kaizhi Qian
Jacob Andreas
Shiyu Chang
Yang Zhang
UDUQCVPER
65
63
0
15 Nov 2023
A Survey of Confidence Estimation and Calibration in Large Language
  Models
A Survey of Confidence Estimation and Calibration in Large Language Models
Jiahui Geng
Fengyu Cai
Yuxia Wang
Heinz Koeppl
Preslav Nakov
Iryna Gurevych
UQCV
114
77
0
14 Nov 2023
Variational Curriculum Reinforcement Learning for Unsupervised Discovery
  of Skills
Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills
Seongun Kim
Kyowoon Lee
Jaesik Choi
SSLDRL
70
10
0
30 Oct 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta
Nils Blach
Aleš Kubíček
Robert Gerstenberger
Michal Podstawski
...
Joanna Gajda
Tomasz Lehmann
H. Niewiadomski
Piotr Nyczyk
Torsten Hoefler
LRMAI4CELM&Ro
145
671
0
18 Aug 2023
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
  LLMs by Validating Low-Confidence Generation
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Neeraj Varshney
Wenlin Yao
Hongming Zhang
Jianshu Chen
Dong Yu
HILM
109
173
0
08 Jul 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of
  Confidence Elicitation in LLMs
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
200
446
0
22 Jun 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
193
1,228
0
31 May 2023
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
  Scores from Language Models Fine-Tuned with Human Feedback
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
110
354
0
24 May 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao
Dian Yu
Jeffrey Zhao
Izhak Shafran
Thomas Griffiths
Yuan Cao
Karthik Narasimhan
LM&RoLRMAI4CE
147
2,010
0
17 May 2023
Active Retrieval Augmented Generation
Active Retrieval Augmented Generation
Zhengbao Jiang
Frank F. Xu
Luyu Gao
Zhiqing Sun
Qian Liu
Jane Dwivedi-Yu
Yiming Yang
Jamie Callan
Graham Neubig
RALM
66
288
0
11 May 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark Gales
HILMLRM
189
439
0
15 Mar 2023
Solving math word problems with process- and outcome-based feedback
Solving math word problems with process- and outcome-based feedback
J. Uesato
Nate Kushman
Ramana Kumar
Francis Song
Noah Y. Siegel
L. Wang
Antonia Creswell
G. Irving
I. Higgins
FaMLReLMAIMatLRM
106
355
0
25 Nov 2022
Language Models (Mostly) Know What They Know
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
122
826
0
11 Jul 2022
STaR: Bootstrapping Reasoning With Reasoning
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLMLRM
142
508
0
28 Mar 2022
Competition-Level Code Generation with AlphaCode
Competition-Level Code Generation with AlphaCode
Yujia Li
David Choi
Junyoung Chung
Nate Kushman
Julian Schrittwieser
...
Esme Sutherland Robson
Pushmeet Kohli
Nando de
Koray Kavukcuoglu
Oriol Vinyals
143
1,403
0
08 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
820
9,576
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
314
4,533
0
27 Oct 2021
Behavior From the Void: Unsupervised Active Pre-Training
Behavior From the Void: Unsupervised Active Pre-Training
Hao Liu
Pieter Abbeel
VLMSSL
91
202
0
08 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
181
2,356
0
05 Mar 2021
Reinforcement Learning with Prototypical Representations
Reinforcement Learning with Prototypical Representations
Denis Yarats
Rob Fergus
A. Lazaric
Lerrel Pinto
SSL
73
226
0
22 Feb 2021
Improving robustness against common corruptions by covariate shift
  adaptation
Improving robustness against common corruptions by covariate shift adaptation
Steffen Schneider
E. Rusak
L. Eck
Oliver Bringmann
Wieland Brendel
Matthias Bethge
VLM
99
482
0
30 Jun 2020
Evaluating Prediction-Time Batch Normalization for Robustness under
  Covariate Shift
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift
Zachary Nado
Shreyas Padhy
D. Sculley
Alexander DÁmour
Balaji Lakshminarayanan
Jasper Snoek
OODAI4TS
92
248
0
19 Jun 2020
MixMatch: A Holistic Approach to Semi-Supervised Learning
MixMatch: A Holistic Approach to Semi-Supervised Learning
David Berthelot
Nicholas Carlini
Ian Goodfellow
Nicolas Papernot
Avital Oliver
Colin Raffel
145
3,033
0
06 May 2019
Semi-supervised Domain Adaptation via Minimax Entropy
Semi-supervised Domain Adaptation via Minimax Entropy
Kuniaki Saito
Donghyun Kim
Stan Sclaroff
Trevor Darrell
Kate Saenko
75
622
0
13 Apr 2019
Exploration by Random Network Distillation
Exploration by Random Network Distillation
Yuri Burda
Harrison Edwards
Amos Storkey
Oleg Klimov
159
1,342
0
30 Oct 2018
A DIRT-T Approach to Unsupervised Domain Adaptation
A DIRT-T Approach to Unsupervised Domain Adaptation
Rui Shu
Hung Bui
Hirokazu Narui
Stefano Ermon
75
625
0
23 Feb 2018
Diversity is All You Need: Learning Skills without a Reward Function
Diversity is All You Need: Learning Skills without a Reward Function
Benjamin Eysenbach
Abhishek Gupta
Julian Ibarz
Sergey Levine
103
1,088
0
16 Feb 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
526
19,237
0
20 Jul 2017
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
216
3,364
0
12 Jun 2017
Curiosity-driven Exploration by Self-supervised Prediction
Curiosity-driven Exploration by Self-supervised Prediction
Deepak Pathak
Pulkit Agrawal
Alexei A. Efros
Trevor Darrell
LRMSSL
113
2,449
0
15 May 2017
AutoDIAL: Automatic DomaIn Alignment Layers
AutoDIAL: Automatic DomaIn Alignment Layers
Fabio Maria Carlucci
Lorenzo Porzi
Barbara Caputo
Elisa Ricci
Samuel Rota Buló
96
315
0
26 Apr 2017
Correlation Alignment for Unsupervised Domain Adaptation
Correlation Alignment for Unsupervised Domain Adaptation
Baochen Sun
Jiashi Feng
Kate Saenko
OOD
49
402
0
06 Dec 2016
1