Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.13063
Cited By
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
22 June 2023
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"
50 / 86 papers shown
Title
Token-Level Uncertainty Estimation for Large Language Model Reasoning
Tunyu Zhang
Haizhou Shi
Yibin Wang
Hengyi Wang
Xiaoxiao He
...
Ligong Han
Kai Xu
Huan Zhang
Dimitris N. Metaxas
Hao Wang
LRM
9
0
0
16 May 2025
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation
Achref Doula
M. Mühlhäuser
Alejandro Sánchez Guinea
24
0
0
14 May 2025
Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection
Pei-Fu Guo
Yun-Da Tsai
Shou-De Lin
UD
51
0
0
12 May 2025
Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis
Heydar Soudani
Evangelos Kanoulas
Faegheh Hasibi
36
0
0
12 May 2025
Uncertainty Quantification for Machine Learning in Healthcare: A Survey
L. J. L. Lopez
Shaza Elsharief
Dhiyaa Al Jorf
Firas Darwish
Congbo Ma
Farah E. Shamout
134
0
0
04 May 2025
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao
Bojian Hou
Zhanliang Wang
Ruochen Jin
Q. Long
Weijie Su
Li Shen
35
0
0
04 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
61
0
0
02 May 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
28
0
0
30 Apr 2025
Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis
Alexei Kaltchenko
51
0
0
30 Apr 2025
Towards Automated Scoping of AI for Social Good Projects
Jacob Emmerson
Rayid Ghani
Zheyuan Ryan Shi
163
0
0
28 Apr 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
Mohit Singh Chauhan
HILM
84
0
0
27 Apr 2025
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning
Abdellah Ghassel
Xianzhi Li
Xiaodan Zhu
51
0
0
26 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
94
0
0
25 Apr 2025
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Zhiyuan Hu
Shiyun Xiong
Yifan Zhang
See-Kiong Ng
Anh Tuan Luu
Jingyi Wang
Shuicheng Yan
Bryan Hooi
46
0
0
22 Apr 2025
Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication
Anqi Shao
73
0
0
18 Apr 2025
Gauging Overprecision in LLMs: An Empirical Study
Adil Bahaj
Hamed Rahimi
Mohamed Chetouani
Mounir Ghogho
75
0
0
16 Apr 2025
Hallucination Detection in LLMs via Topological Divergence on Attention Graphs
Alexandra Bazarova
Aleksandr Yugay
Andrey Shulga
A. Ermilova
Andrei Volodichev
...
Dmitry Simakov
M. Savchenko
Andrey Savchenko
Serguei Barannikov
Alexey Zaytsev
HILM
33
0
0
14 Apr 2025
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
Sophia Hager
David Mueller
Kevin Duh
Nicholas Andrews
67
0
0
18 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
53
1
0
18 Mar 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Richard Ren
Arunim Agarwal
Mantas Mazeika
Cristina Menghini
Robert Vacareanu
...
Matias Geralnik
Adam Khoja
Dean Lee
Summer Yue
Dan Hendrycks
HILM
ALM
90
0
0
05 Mar 2025
FOReCAst: The Future Outcome Reasoning and Confidence Assessment Benchmark
Zhangdie Yuan
Zifeng Ding
Andreas Vlachos
AI4TS
82
0
0
27 Feb 2025
Large Language Model Confidence Estimation via Black-Box Access
Tejaswini Pedapati
Amit Dhurandhar
Soumya Ghosh
Soham Dan
P. Sattigeri
89
3
0
21 Feb 2025
Hallucination Detection in Large Language Models with Metamorphic Relations
Borui Yang
Md Afif Al Mamun
Jie M. Zhang
Gias Uddin
HILM
64
0
0
20 Feb 2025
Confidence Elicitation: A New Attack Vector for Large Language Models
Brian Formento
Chuan-Sheng Foo
See-Kiong Ng
AAML
99
0
0
07 Feb 2025
SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models
Diyana Muhammed
Gollam Rabby
Sören Auer
LLMAG
HILM
81
0
0
03 Feb 2025
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model
Hadas Ben-Atya
N. Gavrielov
Zvi Badash
G. Focht
R. Cytter-Kuint
Talar Hagopian
Dan Turner
M. Freiman
66
0
0
02 Feb 2025
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering
Yumeng Wang
Zhiyuan Fan
Q. Wang
May Fung
Heng Ji
80
1
0
30 Jan 2025
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
Yibin Wang
Haizhou Shi
Ligong Han
Dimitris N. Metaxas
Hao Wang
BDL
UQLM
116
7
0
28 Jan 2025
Boosting LLM-based Relevance Modeling with Distribution-Aware Robust Learning
Hong Liu
Saisai Gong
Yixin Ji
Kaixin Wu
Jia Xu
Jinjie Gu
7
0
0
17 Dec 2024
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
Haizhou Shi
Yibin Wang
Ligong Han
Huan Zhang
Hao Wang
UQCV
83
0
0
07 Dec 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
75
5
0
28 Oct 2024
DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Zhicheng Dou
53
4
0
26 Oct 2024
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo
Miao Xiong
Christina Heinze-Deml
Jaya Narain
ELM
58
3
0
18 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
50
5
0
17 Oct 2024
On Calibration of LLM-based Guard Models for Reliable Content Moderation
Hongfu Liu
Hengguan Huang
Hao Wang
Xiangming Gu
Ye Wang
60
2
0
14 Oct 2024
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng
Chengsong Huang
Banghua Zhu
Jiaxin Huang
34
7
0
13 Oct 2024
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Weibin Liao
Xu Chu
Yasha Wang
LRM
48
6
0
10 Oct 2024
Output Scouting: Auditing Large Language Models for Catastrophic Responses
Andrew Bell
Joao Fonseca
KELM
51
1
0
04 Oct 2024
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Yi Cheng
Xiao Liang
Yeyun Gong
Wen Xiao
Song Wang
...
Wenjie Li
Jian Jiao
Qi Chen
Peng Cheng
Wayne Xiong
HILM
59
1
0
02 Oct 2024
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition
Minseo Kwon
Yaesol Kim
Young J. Kim
39
3
0
28 Sep 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLM
LRM
119
88
0
18 Sep 2024
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration
Jeremy Qin
Bang Liu
Quoc Dinh Nguyen
37
2
0
05 Sep 2024
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
Yongjin Yang
Haneul Yoo
Hwaran Lee
65
1
0
13 Aug 2024
Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection
A. Benfenati
P. Causin
Hang Yu
Zhedong Zheng
3DPC
46
2
0
01 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
40
10
0
27 Jul 2024
LLM Internal States Reveal Hallucination Risk Faced With a Query
Ziwei Ji
Delong Chen
Etsuko Ishii
Samuel Cahyawijaya
Yejin Bang
Bryan Wilie
Pascale Fung
HILM
LRM
39
21
0
03 Jul 2024
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Huixuan Zhang
Yun Lin
Xiaojun Wan
50
0
0
26 Jun 2024
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Yi Fang
Moxin Li
Wenjie Wang
Hui Lin
Fuli Feng
LRM
65
5
0
17 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Evan Becker
Stefano Soatto
45
6
0
05 Jun 2024
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners
Zhi Zheng
Qian Feng
Hang Li
Alois C. Knoll
Jianxiang Feng
54
6
0
01 Jun 2024
1
2
Next