Calibrated Language Models Must Hallucinate

24 November 2023

Adam Tauman Kalai

Papers citing "Calibrated Language Models Must Hallucinate"

49 / 49 papers shown

Title
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review Toghrul Abbasli Kentaroh Toyoda Yuan Wang Leon Witt Muhammad Asif Ali Yukai Miao Dan Li Qingsong Wei UQCV 92 0 0 25 Apr 2025
Three Types of Calibration with Properties and their Semantic and Formal Relationships Rabanus Derr Jessie Finocchiaro Robert C. Williamson 38 0 0 25 Apr 2025
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction Vaishnavh Nagarajan Chen Henry Wu Charles Ding Aditi Raghunathan 36 0 0 21 Apr 2025
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines Reya Vir Shreya Shankar Harrison Chase Will Fu-Hinthorn Aditya G. Parameswaran AI4TS 32 0 0 20 Apr 2025
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations Yiyou Sun Y. Gai Lijie Chen Abhilasha Ravichander Yejin Choi D. Song HILM 57 0 0 17 Apr 2025
High dimensional online calibration in polynomial time Binghui Peng 22 0 0 12 Apr 2025
Hallucination, reliability, and the role of generative AI in science Charles Rathkopf HILM 40 0 0 11 Apr 2025
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Wei Shen Guanlin Liu Zheng Wu Ruofei Zhu Qingping Yang Chao Xin Yu Yue Lin Yan 84 8 0 28 Mar 2025
Estimating stationary mass, frequency by frequency Milind Nakul Vidya Muthukumar A. Pananjady 47 0 0 17 Mar 2025
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning Junming Liu Siyuan Meng Yanting Gao Song Mao Pinlong Cai Guohang Yan Yirong Chen Zilin Bian Botian Shi Ding Wang 51 1 0 17 Mar 2025
Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection Yihao Xue Kristjan Greenewald Youssef Mroueh Baharan Mirzasoleiman HILM 54 1 0 20 Feb 2025
Hallucinations are inevitable but statistically negligible Atsushi Suzuki Yulan He Feng Tian Zhongyuan Wang HILM 49 0 0 15 Feb 2025
Hallucination, Monofacts, and Miscalibration: An Empirical Investigation Muqing Miao Michael Kearns 67 0 0 11 Feb 2025
Selective Response Strategies for GenAI Boaz Taitler Omer Ben-Porat 66 1 0 02 Feb 2025
Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs Reham Omar Omij Mangukiya Essam Mansour 39 0 0 20 Jan 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input Alon Jacovi Andrew Wang Chris Alberti Connie Tao Jon Lipovetz ... Rachana Fellinger Rui Wang Zizhao Zhang Sasha Goldshtein Dipanjan Das HILM ALM 85 13 0 06 Jan 2025
Exploring Facets of Language Generation in the Limit Moses Charikar Chirag Pabbaraju LRM 72 1 0 22 Nov 2024
Distinguishing Ignorance from Error in LLM Hallucinations Adi Simhi Jonathan Herzig Idan Szpektor Yonatan Belinkov HILM 53 2 0 29 Oct 2024
No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models Changlong Wu A. Grama Wojciech Szpankowski 27 1 0 24 Oct 2024
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing Shreya Shankar Tristan Chambers Eugene Wu Aditya G. Parameswaran Eugene Wu LLMAG 56 6 0 16 Oct 2024
On Classification with Large Language Models in Cultural Analytics David Bamman Kent K. Chang L. Lucy Naitian Zhou 28 4 0 15 Oct 2024
An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation Ahmed Abdulaal Hugo Fry Nina Montaña-Brown Ayodeji Ijishakin Jack Gao Stephanie L. Hyland Daniel C. Alexander Daniel Coelho De Castro MedIm 31 8 0 04 Oct 2024
CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs Kangsheng Wang Xiao Zhang Hao Liu Songde Han Huimin Ma Tianyu Hu LRM 51 5 0 02 Oct 2024
State space models, emergence, and ergodicity: How many parameters are needed for stable predictions? Ingvar M. Ziemann Nikolai Matni George J. Pappas 25 1 0 20 Sep 2024
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation Wei Shen Chuheng Zhang OffRL 36 6 0 11 Sep 2024
DiPT: Enhancing LLM reasoning through diversified perspective-taking H. Just Mahavir Dabas Lifu Huang Ming Jin Ruoxi Jia LRM 37 1 0 10 Sep 2024
ContextCite: Attributing Model Generation to Context Benjamin Cohen-Wang Harshay Shah Kristian Georgiev Aleksander Madry LRM 30 18 0 01 Sep 2024
Understanding Generative AI Content with Embedding Models Max Vargas Reilly Cannon A. Engel Anand D. Sarwate Tony Chiang 52 3 0 19 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs Jingtong Su Mingyu Lee SangKeun Lee 40 8 0 02 Aug 2024
Automated Review Generation Method Based on Large Language Models Shican Wu Xiao Ma Dehui Luo Lulu Li Xiangcheng Shi ... Ran Luo Chunlei Pei Zhijian Zhao Zhi-Jian Zhao Jinlong Gong 74 0 0 30 Jul 2024
Building Machines that Learn and Think with People Katherine M. Collins Ilia Sucholutsky Umang Bhatt Kartik Chandra Lionel Wong ... Mark K. Ho Vikash K. Mansinghka Adrian Weller Joshua B. Tenenbaum Thomas L. Griffiths 54 30 0 22 Jul 2024
Towards a Science Exocortex Kevin G. Yager 80 0 0 24 Jun 2024
On Subjective Uncertainty Quantification and Calibration in Natural Language Generation Ziyu Wang Chris Holmes UQLM 45 4 0 07 Jun 2024
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept Guangliang Liu Haitao Mao Bochuan Cao Zhiyu Xue K. Johnson Jiliang Tang Rongrong Wang LRM 34 9 0 04 Jun 2024
Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective Fabian Falck Ziyu Wang Chris Holmes 58 12 0 02 Jun 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools Varun Magesh Faiz Surani Matthew Dahl Mirac Suzgun Christopher D. Manning Daniel E. Ho HILM ELM AILaw 27 66 0 30 May 2024
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs Adi Simhi Jonathan Herzig Idan Szpektor Yonatan Belinkov HILM 46 10 0 15 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs Shreyas Chaudhari Pranjal Aggarwal Vishvak Murahari Tanmay Rajpurohit A. Kalyan Karthik Narasimhan A. Deshpande Bruno Castro da Silva 26 34 0 12 Apr 2024
Language Generation in the Limit Jon M. Kleinberg S. Mullainathan LRM 29 3 0 10 Apr 2024
Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning Teo Susnjak Peter Hwang N. Reyes A. Barczak Timothy R. McIntosh Surangika Ranathunga 70 22 0 08 Apr 2024
Multicalibration for Confidence Scoring in LLMs Gianluca Detommaso Martín Bertrán Riccardo Fogliato Aaron Roth 24 12 0 06 Apr 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate Katie Kang Eric Wallace Claire Tomlin Aviral Kumar Sergey Levine HILM LRM 41 49 0 08 Mar 2024
Guardrail Baselines for Unlearning in LLMs Pratiksha Thaker Yash Maurya Shengyuan Hu Zhiwei Steven Wu Virginia Smith MU 43 38 0 05 Mar 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 56 17 0 28 Feb 2024
On Limitations of the Transformer Architecture Binghui Peng Srini Narayanan Christos H. Papadimitriou 24 32 0 13 Feb 2024
Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models Matthew Dahl Varun Magesh Mirac Suzgun Daniel E. Ho HILM AILaw 25 73 0 02 Jan 2024
How Language Model Hallucinations Can Snowball Muru Zhang Ofir Press William Merrill Alisa Liu Noah A. Smith HILM LRM 82 253 0 22 May 2023
The Internal State of an LLM Knows When It's Lying A. Azaria Tom Michael Mitchell HILM 218 299 0 26 Apr 2023
Truthful AI: Developing and governing AI that does not lie Owain Evans Owen Cotton-Barratt Lukas Finnveden Adam Bales Avital Balwit Peter Wills Luca Righetti William Saunders HILM 236 109 0 13 Oct 2021