Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.03341
Cited By
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
6 June 2023
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inference-Time Intervention: Eliciting Truthful Answers from a Language Model"
50 / 411 papers shown
Title
SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs
Samir Abdaljalil
Hasan Kurban
Parichit Sharma
Erchin Serpedin
Rachad Atat
HILM
61
0
0
07 Mar 2025
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ruixi Lin
Ziqiao Wang
Yang You
FaML
89
1
0
07 Mar 2025
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs
Zara Siddique
Irtaza Khalid
Liam D. Turner
Luis Espinosa-Anke
LLMSV
63
1
0
07 Mar 2025
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs
Lorenz Wolf
Sangwoong Yoon
Ilija Bogunovic
50
0
0
07 Mar 2025
DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models
Y. Guo
Yuchen Yang
Zhe Chen
Pingjie Wang
Yusheng Liao
Yujie Zhang
Yanfeng Wang
Yu Wang
HILM
80
0
0
05 Mar 2025
Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs' Decoding Layers
Zicong He
Boxuan Zhang
Lu Cheng
52
0
0
04 Mar 2025
Effectively Steer LLM To Follow Preference via Building Confident Directions
Bingqing Song
Boran Han
Shuai Zhang
Hao Wang
Haoyang Fang
Bonan Min
Yuyang Wang
Mingyi Hong
LLMSV
54
0
0
04 Mar 2025
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling
Hang Zheng
Hongshen Xu
Yuncong Liu
Lu Chen
Pascale Fung
Kai Yu
112
2
0
04 Mar 2025
SAKE: Steering Activations for Knowledge Editing
Marco Scialanga
Thibault Laugel
Vincent Grari
Marcin Detyniecki
KELM
LLMSV
80
1
0
03 Mar 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
Tingchen Fu
Fazl Barez
AAML
65
0
0
03 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models
Junsol Kim
James Evans
Aaron Schein
77
2
0
03 Mar 2025
Personalize Your LLM: Fake it then Align it
Yijing Zhang
Dyah Adila
Changho Shin
Frederic Sala
91
0
0
02 Mar 2025
How to Steer LLM Latents for Hallucination Detection?
Seongheon Park
Xuefeng Du
Min-Hsuan Yeh
Haobo Wang
Yixuan Li
LLMSV
60
2
0
01 Mar 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
Alexander Robey
Changliu Liu
AAML
LLMSV
47
1
0
28 Feb 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELM
MU
46
0
0
27 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
239
0
0
26 Feb 2025
Investigating Generalization of One-shot LLM Steering Vectors
Jacob Dunefsky
Arman Cohan
LLMSV
39
0
0
26 Feb 2025
Steered Generation via Gradient Descent on Sparse Features
Sumanta Bhattacharyya
Pedram Rooshenas
LLMSV
43
0
0
25 Feb 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
102
0
0
24 Feb 2025
Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models
Anirudh Sundar
Sinead Williamson
Katherine Metcalf
B. Theobald
Skyler Seto
Masha Fedzechkina
LLMSV
80
0
0
24 Feb 2025
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
118
2
0
24 Feb 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
189
0
0
24 Feb 2025
Uncertainty-Aware Fusion: An Ensemble Framework for Mitigating Hallucinations in Large Language Models
Prasenjit Dey
Srujana Merugu
Sivaramakrishnan Kaveri
HILM
34
0
0
22 Feb 2025
Activation Steering in Neural Theorem Provers
Shashank Kirtania
LLMSV
210
0
0
21 Feb 2025
Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans
Masha Fedzechkina
Eleonora Gualdoni
Sinead Williamson
Katherine Metcalf
Skyler Seto
B. Theobald
43
1
0
20 Feb 2025
Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection
Yihao Xue
Kristjan Greenewald
Youssef Mroueh
Baharan Mirzasoleiman
HILM
62
1
0
20 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
LLMSV
110
0
0
18 Feb 2025
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Susumu Takeuchi
47
0
0
18 Feb 2025
Language Models Can Predict Their Own Behavior
Dhananjay Ashok
Jonathan May
ReLM
AI4TS
LRM
63
0
0
18 Feb 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Mengnan Du
LLMSV
74
2
0
17 Feb 2025
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xuben Wang
Yan Hu
Wenyu Du
Reynold Cheng
Benyou Wang
Difan Zou
61
1
0
17 Feb 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
56
0
0
17 Feb 2025
Can ChatGPT Diagnose Alzheimer's Disease?
Quoc Toan Nguyen
Linh Le
Xuan-The Tran
T. Do
Chin-Teng Lin
LM&MA
278
0
0
10 Feb 2025
Task-driven Layerwise Additive Activation Intervention
Hieu Trung Nguyen
Bao Nguyen
Binh Nguyen
V. Nguyen
KELM
56
0
0
10 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
54
0
0
08 Feb 2025
Learning Task Representations from In-Context Learning
Baturay Saglam
Zhuoran Yang
Dionysis Kalogerias
Amin Karbasi
60
1
0
08 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
67
1
0
07 Feb 2025
Enhancing Hallucination Detection through Noise Injection
Litian Liu
Reza Pourreza
Sunny Panchal
Apratim Bhattacharyya
Yao Qin
Roland Memisevic
HILM
81
3
0
06 Feb 2025
Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models
Jialiang Wu
Yi Shen
Sijia Liu
Yi Tang
Sen Song
Xiaoyi Wang
Longjun Cai
70
0
0
05 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Daniel Tamayo
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
KELM
93
3
0
04 Feb 2025
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
Mingi Jung
Saehuyng Lee
Eunji Kim
Sungroh Yoon
70
0
0
03 Feb 2025
Selective Response Strategies for GenAI
Boaz Taitler
Omer Ben-Porat
68
1
0
02 Feb 2025
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency
Sazzad Hossain
Touhidul Alam Seyam
Avijit Chowdhury
Munis Xamidov
Rajib Ghose
Abhijit Pathak
38
0
0
30 Jan 2025
On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models
Naman Goel
HILM
57
0
0
28 Jan 2025
Risk-Aware Distributional Intervention Policies for Language Models
Bao Nguyen
Binh Nguyen
Duy Nguyen
V. Nguyen
32
1
0
28 Jan 2025
Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators
Dingkang Yang
Dongling Xiao
Jinjie Wei
Mingcheng Li
Zhaoyu Chen
Ke Li
Li Zhang
HILM
94
3
0
28 Jan 2025
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach
J. Yang
Dapeng Chen
Yajing Sun
Rongjun Li
Zhiyong Feng
Wei Peng
51
5
0
19 Jan 2025
Reliable Text-to-SQL with Adaptive Abstention
Kaiwen Chen
Yueting Chen
Xiaohui Yu
Nick Koudas
RALM
41
0
0
18 Jan 2025
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment
Pegah Khayatan
Mustafa Shukor
Jayneel Parekh
Matthieu Cord
LLMSV
41
1
0
06 Jan 2025
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Zhicheng Dou
HILM
LRM
38
4
0
02 Jan 2025
Previous
1
2
3
4
5
6
7
8
9
Next