Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22660
Cited By
v1
v2 (latest)
Maximizing Confidence Alone Improves Reasoning
28 May 2025
Mihir Prabhudesai
Lili Chen
Alex Ippoliti
Katerina Fragkiadaki
Hao Liu
Deepak Pathak
OOD
OffRL
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Maximizing Confidence Alone Improves Reasoning"
47 / 47 papers shown
Title
Reasoning Models Better Express Their Confidence
Dongkeun Yoon
Seungone Kim
Sohee Yang
Sunkyoung Kim
Soyeon Kim
Yongil Kim
Eunbi Choi
Yireun Kim
Minjoon Seo
LRM
37
3
0
20 May 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
390
31
0
22 Apr 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
373
1,967
0
22 Jan 2025
On Verbalized Confidence Scores for LLMs
Daniel Yang
Yao-Hung Hubert Tsai
M. Yamada
87
11
0
19 Dec 2024
A Survey of Calibration Process for Black-Box LLMs
Liangru Xie
Hui Liu
Jingying Zeng
Xianfeng Tang
Yan Han
Chen Luo
Jing Huang
Zhen Li
Suhang Wang
Qi He
125
4
0
17 Dec 2024
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Tianyang Xu
Shujin Wu
Shizhe Diao
Xiaoze Liu
Xingyao Wang
Yangyi Chen
Jing Gao
LRM
79
43
0
31 May 2024
Calibration of Large Language Models on Code Summarization
Yuvraj Virk
Prem Devanbu
Toufique Ahmed
89
11
0
30 Apr 2024
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
126
55
0
15 Apr 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELM
AIMat
120
273
0
21 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
138
1,238
0
05 Feb 2024
Calibration and Correctness of Language Models for Code
Claudio Spiess
David Gros
Kunal Suresh Pai
Michael Pradel
Md Rafiqul Islam Rabin
Amin Alipour
Susmit Jha
Prem Devanbu
Toufique Ahmed
91
26
0
03 Feb 2024
Towards Uncertainty-Aware Language Agent
Paul Burgess
Wray Buntine
Ehsan Shareghi
LLMAG
AI4CE
85
6
0
25 Jan 2024
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Avi Singh
John D. Co-Reyes
Rishabh Agarwal
Ankesh Anand
Piyush Patil
...
Yamini Bansal
Ethan Dyer
Behnam Neyshabur
Jascha Narain Sohl-Dickstein
Noah Fiedel
ALM
LRM
ReLM
SyDa
213
189
0
11 Dec 2023
Calibrated Language Models Must Hallucinate
Adam Tauman Kalai
Santosh Vempala
HILM
63
84
0
24 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
113
728
0
20 Nov 2023
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling
Bairu Hou
Yujian Liu
Kaizhi Qian
Jacob Andreas
Shiyu Chang
Yang Zhang
UD
UQCV
PER
65
63
0
15 Nov 2023
A Survey of Confidence Estimation and Calibration in Large Language Models
Jiahui Geng
Fengyu Cai
Yuxia Wang
Heinz Koeppl
Preslav Nakov
Iryna Gurevych
UQCV
114
77
0
14 Nov 2023
Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills
Seongun Kim
Kyowoon Lee
Jaesik Choi
SSL
DRL
70
10
0
30 Oct 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta
Nils Blach
Aleš Kubíček
Robert Gerstenberger
Michal Podstawski
...
Joanna Gajda
Tomasz Lehmann
H. Niewiadomski
Piotr Nyczyk
Torsten Hoefler
LRM
AI4CE
LM&Ro
148
701
0
18 Aug 2023
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Neeraj Varshney
Wenlin Yao
Hongming Zhang
Jianshu Chen
Dong Yu
HILM
109
173
0
08 Jul 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
203
446
0
22 Jun 2023
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
193
1,228
0
31 May 2023
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
113
354
0
24 May 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao
Dian Yu
Jeffrey Zhao
Izhak Shafran
Thomas Griffiths
Yuan Cao
Karthik Narasimhan
LM&Ro
LRM
AI4CE
150
2,010
0
17 May 2023
Active Retrieval Augmented Generation
Zhengbao Jiang
Frank F. Xu
Luyu Gao
Zhiqing Sun
Qian Liu
Jane Dwivedi-Yu
Yiming Yang
Jamie Callan
Graham Neubig
RALM
66
288
0
11 May 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark Gales
HILM
LRM
189
439
0
15 Mar 2023
Solving math word problems with process- and outcome-based feedback
J. Uesato
Nate Kushman
Ramana Kumar
Francis Song
Noah Y. Siegel
L. Wang
Antonia Creswell
G. Irving
I. Higgins
FaML
ReLM
AIMat
LRM
106
355
0
25 Nov 2022
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
122
826
0
11 Jul 2022
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLM
LRM
142
508
0
28 Mar 2022
Competition-Level Code Generation with AlphaCode
Yujia Li
David Choi
Junyoung Chung
Nate Kushman
Julian Schrittwieser
...
Esme Sutherland Robson
Pushmeet Kohli
Nando de
Koray Kavukcuoglu
Oriol Vinyals
143
1,413
0
08 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
823
9,576
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
317
4,533
0
27 Oct 2021
Behavior From the Void: Unsupervised Active Pre-Training
Hao Liu
Pieter Abbeel
VLM
SSL
91
202
0
08 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLM
FaML
181
2,356
0
05 Mar 2021
Reinforcement Learning with Prototypical Representations
Denis Yarats
Rob Fergus
A. Lazaric
Lerrel Pinto
SSL
73
226
0
22 Feb 2021
Improving robustness against common corruptions by covariate shift adaptation
Steffen Schneider
E. Rusak
L. Eck
Oliver Bringmann
Wieland Brendel
Matthias Bethge
VLM
99
482
0
30 Jun 2020
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift
Zachary Nado
Shreyas Padhy
D. Sculley
Alexander DÁmour
Balaji Lakshminarayanan
Jasper Snoek
OOD
AI4TS
92
248
0
19 Jun 2020
MixMatch: A Holistic Approach to Semi-Supervised Learning
David Berthelot
Nicholas Carlini
Ian Goodfellow
Nicolas Papernot
Avital Oliver
Colin Raffel
151
3,033
0
06 May 2019
Semi-supervised Domain Adaptation via Minimax Entropy
Kuniaki Saito
Donghyun Kim
Stan Sclaroff
Trevor Darrell
Kate Saenko
75
622
0
13 Apr 2019
Exploration by Random Network Distillation
Yuri Burda
Harrison Edwards
Amos Storkey
Oleg Klimov
159
1,342
0
30 Oct 2018
A DIRT-T Approach to Unsupervised Domain Adaptation
Rui Shu
Hung Bui
Hirokazu Narui
Stefano Ermon
75
625
0
23 Feb 2018
Diversity is All You Need: Learning Skills without a Reward Function
Benjamin Eysenbach
Abhishek Gupta
Julian Ibarz
Sergey Levine
103
1,088
0
16 Feb 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
526
19,237
0
20 Jul 2017
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
216
3,364
0
12 Jun 2017
Curiosity-driven Exploration by Self-supervised Prediction
Deepak Pathak
Pulkit Agrawal
Alexei A. Efros
Trevor Darrell
LRM
SSL
113
2,449
0
15 May 2017
AutoDIAL: Automatic DomaIn Alignment Layers
Fabio Maria Carlucci
Lorenzo Porzi
Barbara Caputo
Elisa Ricci
Samuel Rota Buló
96
315
0
26 Apr 2017
Correlation Alignment for Unsupervised Domain Adaptation
Baochen Sun
Jiashi Feng
Kate Saenko
OOD
49
402
0
06 Dec 2016
1