Constructing Benchmarks and Interventions for Combating Hallucinations
in LLMs

Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs

15 April 2024

Jonathan Herzig

Yonatan Belinkov

Papers citing "Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs"

12 / 12 papers shown

Title
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home Viktor Moskvoretskii M. Lysyuk Mikhail Salnikov Nikolay Ivanov Sergey Pletenev Daria Galimzianova Nikita Krayko Vasily Konovalov Irina Nikishina Alexander Panchenko RALM 74 4 0 24 Feb 2025
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Hadas Orgad Michael Toker Zorik Gekhman Roi Reichart Idan Szpektor Hadas Kotek Yonatan Belinkov HILM AIFin 61 25 0 03 Oct 2024
Integrative Decoding: Improve Factuality via Implicit Self-consistency Yi Cheng Xiao Liang Yeyun Gong Wen Xiao Song Wang ... Wenjie Li Jian Jiao Qi Chen Peng Cheng Wayne Xiong HILM 56 1 0 02 Oct 2024
Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models Hongbang Yuan Pengfei Cao Zhuoran Jin Yubo Chen Daojian Zeng Kang Liu Jun Zhao HILM 32 3 0 29 Feb 2024
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension Fan Yin Jayanth Srinivasa Kai-Wei Chang HILM 52 19 0 28 Feb 2024
Towards Understanding Sycophancy in Language Models Mrinank Sharma Meg Tong Tomasz Korbak David Duvenaud Amanda Askell ... Oliver Rausch Nicholas Schiefer Da Yan Miranda Zhang Ethan Perez 213 192 0 20 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks Max Tegmark HILM 102 169 0 10 Oct 2023
How Language Model Hallucinations Can Snowball Muru Zhang Ofir Press William Merrill Alisa Liu Noah A. Smith HILM LRM 82 253 0 22 May 2023
The Internal State of an LLM Knows When It's Lying A. Azaria Tom Michael Mitchell HILM 218 299 0 26 Apr 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models Potsawee Manakul Adian Liusie Mark J. F. Gales HILM LRM 152 391 0 15 Mar 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 313 11,953 0 04 Mar 2022
Similarity Analysis of Contextual Word Representation Models John M. Wu Yonatan Belinkov Hassan Sajjad Nadir Durrani Fahim Dalvi James R. Glass 51 73 0 03 May 2020