Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14975
Cited By
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
24 May 2023
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback"
50 / 230 papers shown
Title
A Survey of Calibration Process for Black-Box LLMs
Liangru Xie
Hui Liu
Jingying Zeng
Xianfeng Tang
Yan Han
Chen Luo
Jing Huang
Zhen Li
Suhang Wang
Qi He
74
1
0
17 Dec 2024
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
Boyang Xue
Fei Mi
Qi Zhu
Hongru Wang
Rui Wang
Sheng Wang
Erxin Yu
Xuming Hu
Kam-Fai Wong
HILM
78
0
0
16 Dec 2024
JuStRank: Benchmarking LLM Judges for System Ranking
Ariel Gera
Odellia Boni
Yotam Perlitz
Roy Bar-Haim
Lilach Eden
Asaf Yehudai
ALM
ELM
100
3
0
12 Dec 2024
SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration
Yuanhao Shen
Xiaodan Zhu
L. Chen
90
3
0
11 Dec 2024
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
Haizhou Shi
Yibin Wang
Ligong Han
Huan Zhang
Hao Wang
UQCV
83
0
0
07 Dec 2024
Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning
R. Krishnan
Piyush Khanna
Omesh Tickoo
HILM
72
1
0
03 Dec 2024
Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator
Frederic Kirstein
Terry Ruas
Bela Gipp
89
2
0
27 Nov 2024
Text-to-SQL Calibration: No Need to Ask -- Just Rescale Model Probabilities
Ashwin Ramachandran
Sunita Sarawagi
68
2
0
23 Nov 2024
Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding
Nabeel Seedat
Caterina Tozzi
Andrea Hita Ardiaca
M. Schaar
James Weatherall
Adam Taylor
200
0
0
20 Nov 2024
Graph-based Confidence Calibration for Large Language Models
Yukun Li
Sijia Wang
Lifu Huang
Li-Ping Liu
UQCV
31
1
0
03 Nov 2024
Matchmaker: Self-Improving Large Language Model Programs for Schema Matching
Nabeel Seedat
M. Schaar
39
2
0
31 Oct 2024
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models
Tanmay Parekh
Pradyot Prakash
Alexander Radovic
Akshay Shekher
Denis Savenkov
LRM
86
1
0
30 Oct 2024
Graph-based Uncertainty Metrics for Long-form Language Model Outputs
Mingjian Jiang
Yangjun Ruan
Prasanna Sattigeri
Salim Roukos
Tatsunori Hashimoto
25
0
0
28 Oct 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
78
5
0
28 Oct 2024
EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering
Kai Cheng
Zhengyuan Li
Xingpeng Sun
Byung-Cheol Min
Amrit Singh Bedi
Aniket Bera
43
2
0
26 Oct 2024
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models
Mohammad Beigi
Sijia Wang
Ying Shen
Zihao Lin
Adithya Kulkarni
...
Ming Jin
Jin-Hee Cho
Dawei Zhou
Chang-Tien Lu
Lifu Huang
29
1
0
26 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Liwen Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
49
3
0
24 Oct 2024
A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice
Hsiu-Yuan Huang
Yutong Yang
Zhaoxi Zhang
Sanwoo Lee
Yunfang Wu
44
10
0
20 Oct 2024
LoGU: Long-form Generation with Uncertainty Expressions
Ruihan Yang
Caiqi Zhang
Zhisong Zhang
Xinting Huang
Sen Yang
Nigel Collier
Dong Yu
Deqing Yang
HILM
32
4
0
18 Oct 2024
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo
Miao Xiong
Christina Heinze-Deml
Jaya Narain
ELM
58
3
0
18 Oct 2024
Accounting for Sycophancy in Language Model Uncertainty Estimation
Anthony Sicilia
Mert Inan
Malihe Alikhani
29
1
0
17 Oct 2024
Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors
Anthony Sicilia
Malihe Alikhani
26
2
0
17 Oct 2024
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang
Helen Zhou
Prathusha Kameswara Sarma
Parikshit Gopalan
John Boccio
Sara Bolouki
Xia Hu
35
7
0
17 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
50
5
0
17 Oct 2024
LLM Confidence Evaluation Measures in Zero-Shot CSS Classification
David Farr
Iain Cruickshank
Nico Manzonelli
Nicholas Clark
Kate Starbird
Jevin West
30
0
0
16 Oct 2024
MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models
Boyang Xue
Hongru Wang
Rui Wang
Sheng Wang
Zezhong Wang
Yiming Du
Bin Liang
Kam-Fai Wong
34
0
0
16 Oct 2024
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao
Wenxuan Ding
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
32
0
0
14 Oct 2024
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios
Timo Pierre Schrader
Lukas Lange
Simon Razniewski
Annemarie Friedrich
UQLM
38
0
0
14 Oct 2024
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng
Chengsong Huang
Banghua Zhu
Jiaxin Huang
34
7
0
13 Oct 2024
Calibrating Verbalized Probabilities for Large Language Models
Cheng Wang
Gyuri Szarvas
Georges Balazs
Pavel Danchenko
P. Ernst
20
0
0
09 Oct 2024
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Ruijia Niu
D. Wu
Rose Yu
Yi Ma
33
1
0
09 Oct 2024
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models
Bozhou Li
Hao Liang
Yang Li
Fangcheng Fu
Hongzhi Yin
Conghui He
Wentao Zhang
KELM
CLL
48
0
0
08 Oct 2024
Calibrating Expressions of Certainty
Peiqi Wang
Barbara D. Lam
Yingcheng Liu
Ameneh Asgari-Targhi
Yikang Shen
W. Wells
Tina Kapur
Polina Golland
40
1
0
06 Oct 2024
Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference
Wei Cheng
Tianlu Wang
Yanmin Ji
Fan Yang
Keren Tan
Yiyu Zheng
30
0
0
03 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILM
AIFin
61
29
0
03 Oct 2024
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization
Mingye Zhu
Yi Liu
Quan Wang
Junbo Guo
Zhendong Mao
29
1
0
01 Oct 2024
Calibrating Language Models with Adaptive Temperature Scaling
Johnathan Xie
Annie S. Chen
Yoonho Lee
Eric Mitchell
Chelsea Finn
20
7
0
29 Sep 2024
A Survey on the Honesty of Large Language Models
Siheng Li
Cheng Yang
Taiqiang Wu
Chufan Shi
Yuji Zhang
...
Jie Zhou
Yujiu Yang
Ngai Wong
Xixin Wu
Wai Lam
HILM
35
5
0
27 Sep 2024
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
Xuefeng Du
Chaowei Xiao
Yixuan Li
HILM
37
18
0
26 Sep 2024
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework
Lu Chen
Ruqing Zhang
Jiafeng Guo
Yixing Fan
Xueqi Cheng
29
3
0
24 Sep 2024
Confidence Estimation for LLM-Based Dialogue State Tracking
Yi-Jyun Sun
Suvodip Dey
Dilek Z. Hakkani-Tür
Gokhan Tur
56
1
0
15 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration
Jeremy Qin
Bang Liu
Quoc Dinh Nguyen
40
2
0
05 Sep 2024
Does Alignment Tuning Really Break LLMs' Internal Confidence?
Hongseok Oh
Wonseok Hwang
47
0
0
31 Aug 2024
Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain
Francesca Grasso
Stefano Locci
31
3
0
30 Aug 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
72
5
0
27 Aug 2024
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
Swabha Swayamdipta
42
3
0
26 Aug 2024
Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence?
Shiyu Ni
Keping Bi
Lulu Yu
Jiafeng Guo
HILM
49
4
0
19 Aug 2024
How Susceptible are LLMs to Influence in Prompts?
Sotiris Anagnostidis
Jannis Bulian
LRM
40
16
0
17 Aug 2024
Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models
Wenbo Zhang
Zihang Xu
Hengrui Cai
35
1
0
11 Aug 2024
Previous
1
2
3
4
5
Next