Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

24 May 2023

Christopher D. Manning

ArXiv PDF HTML

Papers citing "Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback"

50 / 232 papers shown

Title
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation Ruixin Yang Dheeraj Rajagopal S. Hayati Bin Hu Dongyeop Kang LLMAG 43 5 0 14 Apr 2024
MetaCheckGPT -- A Multi-task Hallucination Detector Using LLM Uncertainty and Meta-models Rahul Mehta Andrew Hoblitzell Jack O’keefe Hyeju Jang Vasudeva Varma HILM KELM 19 0 0 10 Apr 2024
Multicalibration for Confidence Scoring in LLMs Gianluca Detommaso Martín Bertrán Riccardo Fogliato Aaron Roth 47 12 0 06 Apr 2024
Uncertainty in Language Models: Assessment through Rank-Calibration Xinmeng Huang Shuo Li Mengxin Yu Matteo Sesia Hamed Hassani Insup Lee Osbert Bastani Yan Sun 43 16 0 04 Apr 2024
Empowering Biomedical Discovery with AI Agents Shanghua Gao Ada Fang Yepeng Huang Valentina Giunchiglia Ayush Noori Jonathan Richard Schwarz Yasha Ektefaie Jovana Kondic Marinka Zitnik LLMAG AI4CE 46 67 0 03 Apr 2024
Calibrating the Confidence of Large Language Models by Eliciting Fidelity Mozhi Zhang Mianqiu Huang Rundong Shi Linsen Guo Chong Peng Peng Yan Yaqian Zhou Xipeng Qiu 29 10 0 03 Apr 2024
Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing Zhenyu Qian Yiming Qian Yuting Song Fei Gao Hai Jin Chen Yu Xia Xie 45 0 0 31 Mar 2024
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback Hongshen Xu Zichen Zhu Situo Zhang Da Ma Shuai Fan Lu Chen Kai Yu HILM 39 35 0 27 Mar 2024
Few-Shot Recalibration of Language Models Xiang Lisa Li Urvashi Khandelwal Kelvin Guu 49 5 0 27 Mar 2024
Third-Party Language Model Performance Prediction from Instruction Rahul Nadkarni Yizhong Wang Noah A. Smith ELM LRM 53 0 0 19 Mar 2024
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection Moxin Li Wenjie Wang Fuli Feng Fengbin Zhu Qifan Wang Tat-Seng Chua HILM LRM 46 13 0 15 Mar 2024
Couler: Unified Machine Learning Workflow Optimization in Cloud Xiaoda Wang Yuan-ju Tang Tengda Guo Bo Sang Jingji Wu Jian Sha Ke Zhang Jiang Qian Mingjie Tang 33 0 0 12 Mar 2024
Calibrating Large Language Models Using Their Generations Only Dennis Ulmer Martin Gubri Hwaran Lee Sangdoo Yun Seong Joon Oh UQLM 432 18 1 09 Mar 2024
Bayesian Preference Elicitation with Language Models Kunal Handa Yarin Gal Ellie Pavlick Noah D. Goodman Jacob Andreas Alex Tamkin Belinda Z. Li 42 12 0 08 Mar 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate Katie Kang Eric Wallace Claire Tomlin Aviral Kumar Sergey Levine HILM LRM 46 49 0 08 Mar 2024
LLMs for Targeted Sentiment in News Headlines: Exploring the Descriptive-Prescriptive Dilemma Jana Juros Laura Majer Jan Snajder 31 2 0 01 Mar 2024
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension Fan Yin Jayanth Srinivasa Kai-Wei Chang HILM 60 20 0 28 Feb 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 56 17 0 28 Feb 2024
Predict the Next Word: Humans exhibit uncertainty in this task and language models _____ Evgenia Ilia Wilker Aziz 34 2 0 27 Feb 2024
Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models Xinran Zhao Hongming Zhang Xiaoman Pan Wenlin Yao Dong Yu Tongshuang Wu Jianshu Chen HILM LRM 32 4 0 27 Feb 2024
$C^3$ : Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding Taixi Lu Haoyu Wang Huajie Shao Jing Gao Huaxiu Yao 35 0 0 25 Feb 2024
Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning Tejas Srinivasan Jack Hessel Tanmay Gupta Bill Yuchen Lin Yejin Choi Jesse Thomason Khyathi Raghavi Chandu 26 7 0 23 Feb 2024
Soft Self-Consistency Improves Language Model Agents Han Wang Archiki Prasad Elias Stengel-Eskin Mohit Bansal LLMAG 24 8 0 20 Feb 2024
Thermometer: Towards Universal Calibration for Large Language Models Maohao Shen Subhro Das Kristjan Greenewald P. Sattigeri Greg Wornell Soumya Ghosh 67 9 0 20 Feb 2024
Uncertainty quantification in fine-tuned LLMs using LoRA ensembles Oleksandr Balabanov Hampus Linander UQCV 36 14 0 19 Feb 2024
Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection Min Zhang Jianfeng He Taoran Ji Chang-Tien Lu 33 11 0 18 Feb 2024
Multi-Perspective Consistency Enhances Confidence Estimation in Large Language Models Pei Wang Yejie Wang Muxi Diao Keqing He Guanting Dong Weiran Xu 26 0 0 17 Feb 2024
Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models Hanxing Ding Liang Pang Zihao Wei Huawei Shen Xueqi Cheng HILM RALM 81 16 0 16 Feb 2024
DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection Herun Wan Shangbin Feng Zhaoxuan Tan Heng Wang Yulia Tsvetkov Minnan Luo 72 29 0 16 Feb 2024
Language Models with Conformal Factuality Guarantees Christopher Mohri Tatsunori Hashimoto HILM 44 33 0 15 Feb 2024
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation Xiaoying Zhang Baolin Peng Ye Tian Jingyan Zhou Lifeng Jin Linfeng Song Haitao Mi Helen Meng HILM 42 45 0 14 Feb 2024
Understanding the Effects of Iterative Prompting on Truthfulness Satyapriya Krishna Chirag Agarwal Himabindu Lakkaraju HILM 30 9 0 09 Feb 2024
Calibrating Long-form Generations from Large Language Models Yukun Huang Yixin Liu Raghuveer Thirukovalluru Arman Cohan Bhuwan Dhingra 27 7 0 09 Feb 2024
NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning Yufeng Zhao Yoshihiro Sakai Naoya Inoue 33 4 0 08 Feb 2024
Reconfidencing LLMs from the Grouping Loss Perspective Lihu Chen Alexandre Perez-Lebel Fabian M. Suchanek Gaël Varoquaux 195 8 0 07 Feb 2024
ANLS* -- A Universal Document Processing Metric for Generative Large Language Models David Peer Philemon Schöpf V. Nebendahl A. Rietzler Sebastian Stabinger 27 3 0 06 Feb 2024
Distinguishing the Knowable from the Unknowable with Language Models Gustaf Ahdritz Tian Qin Nikhil Vyas Boaz Barak Benjamin L. Edelman 37 18 0 05 Feb 2024
Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models Anthony Sicilia Hyunwoo J. Kim Khyathi Raghavi Chandu Malihe Alikhani Jack Hessel 20 1 0 05 Feb 2024
Calibration and Correctness of Language Models for Code Claudio Spiess David Gros Kunal Suresh Pai Michael Pradel Md Rafiqul Islam Rabin Amin Alipour Susmit Jha Prem Devanbu Toufique Ahmed 68 19 0 03 Feb 2024
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration Shangbin Feng Weijia Shi Yike Wang Wenxuan Ding Vidhisha Balachandran Yulia Tsvetkov 31 78 0 01 Feb 2024
Towards Uncertainty-Aware Language Agent Jiuzhou Han Wray L. Buntine Ehsan Shareghi LLMAG AI4CE 32 5 0 25 Jan 2024
Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation Mauricio Rivera Jean-François Godbout Reihaneh Rabbany Kellin Pelrine HILM 23 9 0 13 Jan 2024
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty Kaitlyn Zhou Jena D. Hwang Xiang Ren Maarten Sap 36 54 0 12 Jan 2024
Large Language Models for Social Networks: Applications, Challenges, and Solutions Jingying Zeng Richard Huang Waleed Malik Langxuan Yin Bojan Babic Danny Shacham Xiao Yan Jaewon Yang Qi He 22 7 0 04 Jan 2024
Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models Matthew Dahl Varun Magesh Mirac Suzgun Daniel E. Ho HILM AILaw 25 73 0 02 Jan 2024
LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis Jinwen He Yujia Gong Kai-xiang Chen Zijin Lin Chengán Wei Yue Zhao 29 3 0 27 Dec 2023
Self-Evaluation Improves Selective Generation in Large Language Models Jie Jessie Ren Yao-Min Zhao Tu Vu Peter J. Liu Balaji Lakshminarayanan ELM 31 34 0 14 Dec 2023
On Diversified Preferences of Large Language Model Alignment Dun Zeng Yong Dai Pengyu Cheng Longyue Wang Tianhao Hu Wanshun Chen Nan Du Zenglin Xu ALM 38 16 0 12 Dec 2023
Alignment for Honesty Yuqing Yang Ethan Chern Xipeng Qiu Graham Neubig Pengfei Liu 44 30 0 12 Dec 2023
A Study on the Calibration of In-context Learning Hanlin Zhang Yi-Fan Zhang Yaodong Yu Dhruv Madeka Dean Phillips Foster Eric Xing Hima Lakkaraju Sham Kakade 34 7 0 07 Dec 2023