ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.00175
  4. Cited By
Still No Lie Detector for Language Models: Probing Empirical and
  Conceptual Roadblocks

Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks

30 June 2023
B. Levinstein
Daniel A. Herrmann
ArXiv (abs)PDFHTML

Papers citing "Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks"

46 / 46 papers shown
Title
Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence
Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence
Matthieu Queloz
10
1
0
29 Jul 2025
Mechanistic Indicators of Understanding in Large Language Models
Mechanistic Indicators of Understanding in Large Language Models
Pierre Beckmann
Matthieu Queloz
14
0
0
07 Jul 2025
Detecting High-Stakes Interactions with Activation Probes
Detecting High-Stakes Interactions with Activation Probes
Alex McKenzie
Urja Pawar
Phil Blandfort
William Bankes
David M. Krueger
Ekdeep Singh Lubana
Dmitrii Krasheninnikov
217
0
0
12 Jun 2025
The Geometries of Truth Are Orthogonal Across Tasks
Waiss Azizian
Michael Kirchhof
Eugène Ndiaye
Louis Béthune
Michal Klein
Pierre Ablin
Marco Cuturi
54
0
0
10 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MAAI4CE
86
0
0
05 Jun 2025
Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks
Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks
Yuntai Bao
Xuhong Zhang
Tianyu Du
Xinkui Zhao
Zhengwen Feng
Hao Peng
Jianwei Yin
HILM
68
0
0
01 Jun 2025
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
Qing Li
Jiahui Geng
Zongxiong Chen
Derui Zhu
Yuxia Wang
Congbo Ma
Chenyang Lyu
Fakhri Karray
44
0
0
30 May 2025
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
Yuqing Yang
Robin Jia
KELMLRM
135
1
0
22 May 2025
Exploring the generalization of LLM truth directions on conversational formats
Exploring the generalization of LLM truth directions on conversational formats
Timour Ichmoukhamedov
David Martens
96
1
0
14 May 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
188
1
0
24 Feb 2025
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger
Wenjun Li
Dexun Li
Kuicai Dong
Cong Zhang
Hao Zhang
Weiwen Liu
Yasheng Wang
Ruiming Tang
Yong Liu
LLMAGKELM
37
6
0
18 Feb 2025
Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models
Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models
Jaturong Kongmanee
99
1
0
25 Jan 2025
Representation in large language models
Cameron C. Yetman
103
1
0
03 Jan 2025
HalluCana: Fixing LLM Hallucination with A Canary Lookahead
HalluCana: Fixing LLM Hallucination with A Canary Lookahead
Tianyi Li
Erenay Dayanik
Shubhi Tyagi
Andrea Pierleoni
HILM
143
1
0
10 Dec 2024
A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios
A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios
Xiachong Feng
Longxu Dou
Ella Li
Qinghao Wang
Haoran Wang
Yu Guo
Chang Ma
Lingpeng Kong
AI4CELM&RoLM&MAELMLLMAG
182
7
0
05 Dec 2024
Noise Injection Reveals Hidden Capabilities of Sandbagging Language
  Models
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Cameron Tice
Philipp Alexander Kreer
Nathan Helm-Burger
Prithviraj Singh Shahani
Fedor Ryzhenkov
Jacob Haimes
Felix Hofstätter
Teun van der Weij
142
3
0
02 Dec 2024
Linear Probe Penalties Reduce LLM Sycophancy
Linear Probe Penalties Reduce LLM Sycophancy
Henry Papadatos
Rachel Freedman
LLMSV
123
4
0
01 Dec 2024
Prompt-Guided Internal States for Hallucination Detection of Large Language Models
Prompt-Guided Internal States for Hallucination Detection of Large Language Models
Fujie Zhang
Peiqi Yu
Biao Yi
Baolei Zhang
Tong Li
Zheli Liu
HILMLRM
149
1
0
07 Nov 2024
Distinguishing Ignorance from Error in LLM Hallucinations
Distinguishing Ignorance from Error in LLM Hallucinations
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
101
4
0
29 Oct 2024
Chatting with Bots: AI, Speech Acts, and the Edge of Assertion
Chatting with Bots: AI, Speech Acts, and the Edge of Assertion
Iwan Williams
Tim Bayne
84
2
0
22 Oct 2024
Evaluating Language Model Character Traits
Evaluating Language Model Character Traits
Francis Rhys Ward
Zejia Yang
Alex Jackson
Randy Brown
Chandler Smith
Grace Colverd
Louis Thomson
Raymond Douglas
Patrik Bartak
Andrew Rowan
74
0
0
05 Oct 2024
Meta-Models: An Architecture for Decoding LLM Behaviors Through
  Interpreted Embeddings and Natural Language
Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language
Anthony Costarelli
Mat Allen
Severin Field
76
3
0
03 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILMAIFin
151
61
0
03 Oct 2024
A Survey on the Honesty of Large Language Models
A Survey on the Honesty of Large Language Models
Siheng Li
Cheng Yang
Taiqiang Wu
Chufan Shi
Yuji Zhang
...
Jie Zhou
Yujiu Yang
Ngai Wong
Xixin Wu
Wai Lam
HILM
114
8
0
27 Sep 2024
On the Relationship between Truth and Political Bias in Language Models
On the Relationship between Truth and Political Bias in Language Models
S. Fulay
William Brannon
Shrestha Mohanty
Cassandra Overney
Elinor Poole-Dayan
Deb Roy
Jad Kabbara
HILM
79
5
0
09 Sep 2024
Identifying the Source of Generation for Large Language Models
Identifying the Source of Generation for Large Language Models
Bumjin Park
Jaesik Choi
79
0
0
05 Jul 2024
Truth is Universal: Robust Detection of Lies in LLMs
Truth is Universal: Robust Detection of Lies in LLMs
Lennart Bürger
Fred Hamprecht
B. Nadler
HILM
128
26
0
03 Jul 2024
Does ChatGPT Have a Mind?
Does ChatGPT Have a Mind?
Simon Goldstein
B. Levinstein
AI4MHLRM
84
9
0
27 Jun 2024
Standards for Belief Representations in LLMs
Standards for Belief Representations in LLMs
Daniel A. Herrmann
B. Levinstein
99
15
0
31 May 2024
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control
Huanshuo Liu
Hao Zhang
Zhijiang Guo
Kuicai Dong
Xiangyang Li
Yi Quan Lee
Cong Zhang
Yong Liu
3DV
98
6
0
29 May 2024
An Assessment of Model-On-Model Deception
An Assessment of Model-On-Model Deception
Julius Heitkoetter
Michael Gerovitch
Laker Newhouse
78
4
0
10 May 2024
Truth-value judgment in language models: 'truth directions' are context sensitive
Truth-value judgment in language models: 'truth directions' are context sensitive
Stefan F. Schouten
Peter Bloem
Ilia Markov
Piek Vossen
KELM
170
2
0
29 Apr 2024
Constructing Benchmarks and Interventions for Combating Hallucinations
  in LLMs
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
127
13
0
15 Apr 2024
Language Models in Dialogue: Conversational Maxims for Human-AI
  Interactions
Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Erik Miehling
Manish Nagireddy
P. Sattigeri
Elizabeth M. Daly
David Piorkowski
John T. Richards
ALM
138
15
0
22 Mar 2024
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
Peter Hase
Mohit Bansal
Peter Clark
Sarah Wiegreffe
168
35
0
12 Jan 2024
Are Language Models More Like Libraries or Like Librarians?
  Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs
Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs
Harvey Lederman
Kyle Mahowald
112
13
0
10 Jan 2024
Challenges with unsupervised LLM knowledge discovery
Challenges with unsupervised LLM knowledge discovery
Sebastian Farquhar
Vikrant Varma
Zachary Kenton
Johannes Gasteiger
Vladimir Mikulik
Rohin Shah
84
26
0
15 Dec 2023
Weakly Supervised Detection of Hallucinations in LLM Activations
Weakly Supervised Detection of Hallucinations in LLM Activations
Miriam Rateike
C. Cintas
John Wamburu
Tanya Akumu
Skyler Speakman
90
14
0
05 Dec 2023
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Francis Rhys Ward
Francesco Belardinelli
Francesca Toni
Tom Everitt
192
32
0
03 Dec 2023
Localizing Lying in Llama: Understanding Instructed Dishonesty on
  True-False Questions Through Prompting, Probing, and Patching
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching
James Campbell
Richard Ren
Phillip Guo
HILM
89
21
0
25 Nov 2023
A Survey of Confidence Estimation and Calibration in Large Language
  Models
A Survey of Confidence Estimation and Calibration in Large Language Models
Jiahui Geng
Fengyu Cai
Yuxia Wang
Heinz Koeppl
Preslav Nakov
Iryna Gurevych
UQCV
154
91
0
14 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRMHILM
160
1,046
0
09 Nov 2023
Self-Consistency of Large Language Models under Ambiguity
Self-Consistency of Large Language Models under Ambiguity
Henning Bartsch
Ole Jorgensen
Domenic Rosati
Jason Hoelscher-Obermaier
Jacob Pfau
HILM
81
10
0
20 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model
  Representations of True/False Datasets
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
170
248
0
10 Oct 2023
AI Deception: A Survey of Examples, Risks, and Potential Solutions
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park
Simon Goldstein
Aidan O'Gara
Michael Chen
Dan Hendrycks
89
170
0
28 Aug 2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Stephen Casper
Jason Lin
Joe Kwon
Gatlen Culp
Dylan Hadfield-Menell
AAML
74
101
0
15 Jun 2023
1