Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.03729
Cited By
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
27 November 2023
Kevin Liu
Stephen Casper
Dylan Hadfield-Menell
Jacob Andreas
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?"
15 / 15 papers shown
Title
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction
Eitan Wagner
Omri Abend
36
0
0
04 May 2025
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Henk Tillman
Dan Mossing
LLMSV
50
0
0
28 Apr 2025
Continuum-Interaction-Driven Intelligence: Human-Aligned Neural Architecture via Crystallized Reasoning and Fluid Generation
Pengcheng Zhou
Zhiqiang Nie
Haochen Li
48
0
0
12 Apr 2025
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
72
5
0
28 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
50
4
0
17 Oct 2024
Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis
Daoyang Li
Mingyu Jin
Qingcheng Zeng
Mengnan Du
60
2
0
22 Sep 2024
Can Language Model Understand Word Semantics as A Chatbot? An Empirical Study of Language Model Internal External Mismatch
Jinman Zhao
Xueyan Zhang
Xingyu Yue
Weizhe Chen
Zifan Qian
Ruiyu Wang
LRM
34
0
0
21 Sep 2024
LLM Internal States Reveal Hallucination Risk Faced With a Query
Ziwei Ji
Delong Chen
Etsuko Ishii
Samuel Cahyawijaya
Yejin Bang
Bryan Wilie
Pascale Fung
HILM
LRM
36
19
0
03 Jul 2024
Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Taiming Lu
Muhan Gao
Kuai Yu
Adam Byerly
Daniel Khashabi
49
11
0
20 Jun 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
48
6
0
12 Apr 2024
Uncovering Latent Human Wellbeing in Language Model Embeddings
Pedro Freire
ChengCheng Tan
Adam Gleave
Dan Hendrycks
Scott Emmons
36
1
0
19 Feb 2024
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
102
169
0
10 Oct 2023
Truthful AI: Developing and governing AI that does not lie
Owain Evans
Owen Cotton-Barratt
Lukas Finnveden
Adam Bales
Avital Balwit
Peter Wills
Luca Righetti
William Saunders
HILM
233
109
0
13 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
405
0
24 Feb 2021
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
415
2,586
0
03 Sep 2019
1