Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.03341
Cited By
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
6 June 2023
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inference-Time Intervention: Eliciting Truthful Answers from a Language Model"
50 / 409 papers shown
Title
Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective
Zhongxiang Sun
Qipeng Wang
Haoyu Wang
Xiao Zhang
Jun Xu
HILM
LRM
9
0
0
19 May 2025
Contrastive Prompting Enhances Sentence Embeddings in LLMs through Inference-Time Steering
Zifeng Cheng
Zhonghui Wang
Yuchen Fu
Zhiwei Jiang
Yafeng Yin
Cong Wang
Qing Gu
17
0
0
19 May 2025
Truth Neurons
Haohang Li
Yupeng Cao
Yangyang Yu
Jordan W. Suchow
Zining Zhu
HILM
MILM
KELM
3
0
0
18 May 2025
Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation
Chengwei Qin
Wenxuan Zhou
Karthik Abinav Sankararaman
Nanshu Wang
Tengyu Xu
...
Aditya Tayade
Sinong Wang
Chenyu You
Han Fang
Hao Ma
HILM
LRM
7
0
0
18 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
2
0
0
18 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
4
0
0
18 May 2025
Interpretable Risk Mitigation in LLM Agent Systems
Jan Chojnacki
LLMAG
17
0
0
15 May 2025
Exploring the generalization of LLM truth directions on conversational formats
Timour Ichmoukhamedov
David Martens
19
0
0
14 May 2025
On the Geometry of Semantics in Next-token Prediction
Yize Zhao
Christos Thrampoulidis
23
0
0
13 May 2025
Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering
Jessica Y. Bo
Tianyu Xu
Ishan Chatterjee
Katrina Passarella-Ward
Achin Kulshrestha
D Shin
LLMSV
87
0
0
07 May 2025
What Is AI Safety? What Do We Want It to Be?
Jacqueline Harding
Cameron Domenico Kirk-Giannini
78
0
0
05 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
58
0
0
04 May 2025
On the Limitations of Steering in Language Model Alignment
Chebrolu Niranjan
Kokil Jaidka
G. Yeo
LLMSV
43
0
0
02 May 2025
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
62
1
0
23 Apr 2025
Functional Abstraction of Knowledge Recall in Large Language Models
Zijian Wang
Chang Xu
KELM
34
0
0
20 Apr 2025
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Heng Chang
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
40
1
0
20 Apr 2025
Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey
Ahsan Bilal
Muhammad Ahmed Mohsin
Muhammad Umer
Muhammad Awais Khan Bangash
Muhammad Ali Jamshed
LLMAG
LRM
AI4CE
56
0
0
20 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
34
0
0
19 Apr 2025
HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs
Sharanya Dasgupta
Sujoy Nath
Arkaprabha Basu
Pourya Shamsolmoali
Swagatam Das
HILM
65
0
0
13 Apr 2025
Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection
MingShan Liu
Shi Bo
Jialing Fang
LRM
30
0
0
13 Apr 2025
Robust Hallucination Detection in LLMs via Adaptive Token Selection
Mengjia Niu
Hamed Haddadi
Guansong Pang
HILM
55
0
0
10 Apr 2025
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning
Zijian Wang
Chang Xu
LRM
30
1
0
09 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
Dang Nguyen
Chenhao Tan
32
0
0
07 Apr 2025
Steering off Course: Reliability Challenges in Steering Language Models
Patrick Queiroz Da Silva
Hari Sethuraman
Dheeraj Rajagopal
Hannaneh Hajishirzi
Sachin Kumar
LLMSV
31
1
0
06 Apr 2025
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du
Weikai Li
Min Cai
Karim Saraipour
Zimin Zhang
Himabindu Lakkaraju
Yizhou Sun
Shichang Zhang
KELM
56
0
0
03 Apr 2025
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Liangjie Huang
Dawei Li
Huan Liu
Lu Cheng
LRM
36
0
0
03 Apr 2025
From Text to Graph: Leveraging Graph Neural Networks for Enhanced Explainability in NLP
Fabio Yáñez-Romero
Andrés Montoyo
Armando Suárez
Yoan Gutiérrez
Ruslan Mitkov
49
0
0
02 Apr 2025
The Illusionist's Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances
Yining Wang
Yansen Wang
Xi Li
Mi Zhang
Geng Hong
Min Yang
AAML
HILM
67
0
0
01 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
78
0
0
01 Apr 2025
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts
Youxiang Zhu
Ruochen Li
Danqing Wang
Daniel Haehn
Xiaohui Liang
LRM
63
1
0
30 Mar 2025
The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction
Yihuai Hong
Dian Zhou
Meng Cao
Lei Yu
Zhijing Jin
LRM
46
0
0
29 Mar 2025
Shared Global and Local Geometry of Language Model Embeddings
Andrew Lee
Melanie Weber
F. Viégas
Martin Wattenberg
FedML
79
3
0
27 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLM
VLM
LRM
85
5
0
25 Mar 2025
Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models
Nariman Naderi
Seyed Amir Ahmad Safavi-Naini
Thomas Savage
Zahra Atf
Peter Lewis
Girish Nadkarni
Ali Soroush
ELM
94
1
0
24 Mar 2025
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Baolong Bi
Shenghua Liu
Yansen Wang
Yilong Xu
Junfeng Fang
Lingrui Mei
Xueqi Cheng
KELM
61
4
0
20 Mar 2025
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Shuo Li
Jiajun Sun
Guodong Zheng
Xiaoran Fan
Yujiong Shen
...
Wenming Tan
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
VLM
90
1
0
19 Mar 2025
Inference-Time Intervention in Large Language Models for Reliable Requirement Verification
Paul Darm
James Xie
A. Riccardi
44
0
0
18 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
53
1
0
18 Mar 2025
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis
Guy Bar-Shalom
Fabrizio Frasca
Derek Lim
Yoav Gelberg
Yftah Ziser
Ran El-Yaniv
Gal Chechik
Haggai Maron
67
0
0
18 Mar 2025
HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models
Xinyan Jiang
Hang Ye
Yongxin Zhu
Xiaoying Zheng
Zikang Chen
Jun Gong
49
0
0
17 Mar 2025
DAPI: Domain Adaptive Toxicity Probe Vector Intervention for Fine-Grained Detoxification
Cho Hyeonsu
Dooyoung Kim
Youngjoong Ko
MoMe
46
0
0
17 Mar 2025
Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering
Xinyu Tang
Xiaolei Wang
Zhihao Lv
Yingqian Min
Wayne Xin Zhao
Binbin Hu
Ziqi Liu
Qing Cui
LRM
84
3
0
14 Mar 2025
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
66
0
0
13 Mar 2025
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan
Fei Kong
Hao-Ran Cheng
James Diffenderfer
B. Kailkhura
Lichao Sun
Xiaofeng Zhu
Xiaoshuang Shi
Kaidi Xu
173
0
0
13 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu
Dong Gong
Erdun Gao
Zhen Zhang
Zhen Zhang
Biwei Huang
Anton van den Hengel
Javen Qinfeng Shi
Javen Qinfeng Shi
178
0
0
12 Mar 2025
Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models
Shahnewaz Karim Sakib
Anindya Bijoy Das
Shibbir Ahmed
AAML
58
1
0
12 Mar 2025
Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation
Yu Wang
Jiaxin Zhang
Xiang Gao
Wendi Cui
Peng Li
Kamalika Das
51
0
0
11 Mar 2025
Mitigating Memorization in LLMs using Activation Steering
Manan Suri
Nishit Anand
Amisha Bhaskar
LLMSV
54
2
0
08 Mar 2025
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs
Zara Siddique
Irtaza Khalid
Liam D. Turner
Luis Espinosa-Anke
LLMSV
63
1
0
07 Mar 2025
SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs
Samir Abdaljalil
Hasan Kurban
Parichit Sharma
Erchin Serpedin
Rachad Atat
HILM
58
0
0
07 Mar 2025
1
2
3
4
5
6
7
8
9
Next