ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.13734
  4. Cited By
The Internal State of an LLM Knows When It's Lying
v1v2 (latest)

The Internal State of an LLM Knows When It's Lying

26 April 2023
A. Azaria
Tom Michael Mitchell
    HILM
ArXiv (abs)PDFHTML

Papers citing "The Internal State of an LLM Knows When It's Lying"

44 / 44 papers shown
Title
Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs
Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs
Hexiang Tan
Fei Sun
Sha Liu
Du Su
Qi Cao
...
Jingang Wang
Xunliang Cai
Yuanzhuo Wang
Huawei Shen
Xueqi Cheng
HILM
146
0
0
23 May 2025
Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models
Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models
Yukin Zhang
Qi Dong
89
0
0
23 May 2025
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
Yuqing Yang
Robin Jia
KELMLRM
103
1
0
22 May 2025
Social preferences with unstable interactive reasoning: Large language models in economic trust games
Social preferences with unstable interactive reasoning: Large language models in economic trust games
Ou Jiamin
Eikmans Emile
Buskens Vincent
Pankowska Paulina
Shan Yuli
ReLMLRM
59
0
0
16 May 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
90
6
0
18 Mar 2025
HalluCounter: Reference-free LLM Hallucination Detection in the Wild!
HalluCounter: Reference-free LLM Hallucination Detection in the Wild!
Ashok Urlana
Gopichand Kanumolu
Charaka Vinayak Kumar
B. Garlapati
Rahul Mishra
HILM
102
0
0
06 Mar 2025
Similarity-Distance-Magnitude Universal Verification
Similarity-Distance-Magnitude Universal Verification
Allen Schmaltz
UQCVAAML
539
0
0
27 Feb 2025
CER: Confidence Enhanced Reasoning in LLMs
CER: Confidence Enhanced Reasoning in LLMs
Ali Razghandi
Seyed Mohammad Hadi Hosseini
Mahdieh Soleymani Baghshah
LRM
161
5
0
20 Feb 2025
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
Wichayaporn Wongkamjan
Yanze Wang
Feng Gu
Denis Peskoff
Jonathan K. Kummerfeld
Jonathan May
Jordan Lee Boyd-Graber
162
0
0
18 Feb 2025
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
Yibin Wang
Haizhou Shi
Ligong Han
Dimitris N. Metaxas
Hao Wang
BDLUQLM
216
13
0
28 Jan 2025
Citations and Trust in LLM Generated Responses
Yifan Ding
Matthew Facciani
Amrit Poudel
Ellen Joyce
Salvador Aguiñaga
Balaji Veeramani
Sanmitra Bhattacharya
Tim Weninger
HILM
124
4
0
03 Jan 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
131
9
0
31 Dec 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando
Oscar Obeso
Senthooran Rajamanoharan
Neel Nanda
153
33
0
21 Nov 2024
Prompt-Guided Internal States for Hallucination Detection of Large Language Models
Prompt-Guided Internal States for Hallucination Detection of Large Language Models
Fujie Zhang
Peiqi Yu
Biao Yi
Baolei Zhang
Tong Li
Zheli Liu
HILMLRM
125
0
0
07 Nov 2024
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Boxing Chen
Sarath Chandar
154
0
0
22 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
131
0
0
22 Oct 2024
Do LLMs "know" internally when they follow instructions?
Do LLMs "know" internally when they follow instructions?
Juyeon Heo
Christina Heinze-Deml
Oussama Elachqar
Shirley Ren
Udhay Nallasamy
Andy Miller
Kwan Ho Ryan Chan
Jaya Narain
107
10
0
18 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui Wang
LRM
104
15
0
17 Oct 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
Zhongxiang Sun
Xiaoxue Zang
Kai Zheng
Yang Song
Jun Xu
Xiao Zhang
Weijie Yu
Yang Song
Han Li
120
17
0
15 Oct 2024
Improving Instruction-Following in Language Models through Activation Steering
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
141
28
0
15 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILMAIFin
113
45
0
03 Oct 2024
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Nick Jiang
Anish Kachinthaya
Suzie Petryk
Yossi Gandelsman
VLM
88
28
0
03 Oct 2024
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Yi Cheng
Xiao Liang
Yeyun Gong
Wen Xiao
Song Wang
...
Wenjie Li
Jian Jiao
Qi Chen
Peng Cheng
Wayne Xiong
HILM
131
3
0
02 Oct 2024
Uncovering Latent Chain of Thought Vectors in Language Models
Uncovering Latent Chain of Thought Vectors in Language Models
Jason Zhang
Scott Viteri
LLMSVLRM
108
3
0
21 Sep 2024
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
Sania Nayab
Giulio Rossolini
Giorgio Buttazzo
Nicolamaria Manes
F. Giacomelli
Nicolamaria Manes
Fabrizio Giacomelli
LRM
116
42
0
29 Jul 2024
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
D. Yaldiz
Yavuz Faruk Bakman
Baturalp Buyukates
Chenyang Tao
Anil Ramakrishna
Dimitrios Dimitriadis
Jieyu Zhao
Salman Avestimehr
124
8
0
17 Jun 2024
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
Bairu Hou
Yang Zhang
Jacob Andreas
Shiyu Chang
133
7
0
11 Jun 2024
Standards for Belief Representations in LLMs
Standards for Belief Representations in LLMs
Daniel A. Herrmann
B. Levinstein
77
11
0
31 May 2024
Prompting open-source and commercial language models for grammatical error correction of English learner text
Prompting open-source and commercial language models for grammatical error correction of English learner text
Christopher Davis
Andrew Caines
Oistein Andersen
Shiva Taslimipoor
H. Yannakoudakis
Zheng Yuan
Christopher Bryant
Marek Rei
P. Buttery
84
17
0
15 Jan 2024
Ragas: Automated Evaluation of Retrieval Augmented Generation
Ragas: Automated Evaluation of Retrieval Augmented Generation
ES Shahul
Jithin James
Luis Espinosa-Anke
Steven Schockaert
134
196
0
26 Sep 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
123
1,673
0
06 Mar 2023
Check Your Facts and Try Again: Improving Large Language Models with
  External Knowledge and Automated Feedback
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
Baolin Peng
Michel Galley
Pengcheng He
Hao Cheng
Yujia Xie
...
Qiuyuan Huang
Lars Liden
Zhou Yu
Weizhu Chen
Jianfeng Gao
KELMHILMLRM
89
399
0
24 Feb 2023
GPT Takes the Bar Exam
GPT Takes the Bar Exam
M. Bommarito
Daniel Martin Katz
ELM
77
155
0
29 Dec 2022
Detecting and Mitigating Hallucinations in Machine Translation: Model
  Internal Workings Alone Do Well, Sentence Similarity Even Better
Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better
David Dale
Elena Voita
Loïc Barrault
Marta R. Costa-jussá
HILM
218
73
0
16 Dec 2022
Discovering Latent Knowledge in Language Models Without Supervision
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
153
386
0
07 Dec 2022
Fine-tuning language models to find agreement among humans with diverse
  preferences
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
110
236
0
28 Nov 2022
OPT: Open Pre-trained Transformer Language Models
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLMOSLMAI4CE
362
3,699
0
02 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
888
13,207
0
04 Mar 2022
Survey of Hallucination in Natural Language Generation
Survey of Hallucination in Natural Language Generation
Ziwei Ji
Nayeon Lee
Rita Frieske
Tiezheng Yu
D. Su
...
Delong Chen
Wenliang Dai
Ho Shu Chan
Andrea Madotto
Pascale Fung
HILMLRM
232
2,435
0
08 Feb 2022
Understanding Factuality in Abstractive Summarization with FRANK: A
  Benchmark for Factuality Metrics
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
282
311
0
27 Apr 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
889
42,463
0
28 May 2020
Wizard of Wikipedia: Knowledge-Powered Conversational agents
Wizard of Wikipedia: Knowledge-Powered Conversational agents
Emily Dinan
Stephen Roller
Kurt Shuster
Angela Fan
Michael Auli
Jason Weston
RALMKELM
146
950
0
03 Nov 2018
FEVER: a large-scale dataset for Fact Extraction and VERification
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne
Andreas Vlachos
Christos Christodoulopoulos
Arpit Mittal
HILM
167
1,667
0
14 Mar 2018
Overcoming catastrophic forgetting in neural networks
Overcoming catastrophic forgetting in neural networks
J. Kirkpatrick
Razvan Pascanu
Neil C. Rabinowitz
J. Veness
Guillaume Desjardins
...
A. Grabska-Barwinska
Demis Hassabis
Claudia Clopath
D. Kumaran
R. Hadsell
CLL
374
7,572
0
02 Dec 2016
1