Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

6 June 2023

Papers citing "Inference-Time Intervention: Eliciting Truthful Answers from a Language Model"

50 / 411 papers shown

Title
SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs Samir Abdaljalil Hasan Kurban Parichit Sharma Erchin Serpedin Rachad Atat HILM 61 0 0 07 Mar 2025
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy Ruixi Lin Ziqiao Wang Yang You FaML 89 1 0 07 Mar 2025
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs Zara Siddique Irtaza Khalid Liam D. Turner Luis Espinosa-Anke LLMSV 63 1 0 07 Mar 2025
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs Lorenz Wolf Sangwoong Yoon Ilija Bogunovic 50 0 0 07 Mar 2025
DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models Y. Guo Yuchen Yang Zhe Chen Pingjie Wang Yusheng Liao Yujie Zhang Yanfeng Wang Yu Wang HILM 80 0 0 05 Mar 2025
Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs' Decoding Layers Zicong He Boxuan Zhang Lu Cheng 52 0 0 04 Mar 2025
Effectively Steer LLM To Follow Preference via Building Confident Directions Bingqing Song Boran Han Shuai Zhang Hao Wang Haoyang Fang Bonan Min Yuyang Wang Mingyi Hong LLMSV 54 0 0 04 Mar 2025
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling Hang Zheng Hongshen Xu Yuncong Liu Lu Chen Pascale Fung Kai Yu 112 2 0 04 Mar 2025
SAKE: Steering Activations for Knowledge Editing Marco Scialanga Thibault Laugel Vincent Grari Marcin Detyniecki KELM LLMSV 80 1 0 03 Mar 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness Tingchen Fu Fazl Barez AAML 65 0 0 03 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models Junsol Kim James Evans Aaron Schein 77 2 0 03 Mar 2025
Personalize Your LLM: Fake it then Align it Yijing Zhang Dyah Adila Changho Shin Frederic Sala 91 0 0 02 Mar 2025
How to Steer LLM Latents for Hallucination Detection? Seongheon Park Xuefeng Du Min-Hsuan Yeh Haobo Wang Yixuan Li LLMSV 60 2 0 01 Mar 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks Hanjiang Hu Alexander Robey Changliu Liu AAML LLMSV 47 1 0 28 Feb 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Tianyi Lorena Yan Robin Jia KELM MU 46 0 0 27 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement Siyuan Zhang Y. Zhang Yinpeng Dong Hang Su HILM KELM 239 0 0 26 Feb 2025
Investigating Generalization of One-shot LLM Steering Vectors Jacob Dunefsky Arman Cohan LLMSV 39 0 0 26 Feb 2025
Steered Generation via Gradient Descent on Sparse Features Sumanta Bhattacharyya Pedram Rooshenas LLMSV 43 0 0 25 Feb 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges Lukasz Bartoszcze Sarthak Munshi Bryan Sukidi Jennifer Yen Zejia Yang David Williams-King Linh Le Kosi Asuzu Carsten Maple 102 0 0 24 Feb 2025
Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models Anirudh Sundar Sinead Williamson Katherine Metcalf B. Theobald Skyler Seto Masha Fedzechkina LLMSV 80 0 0 24 Feb 2025
Is Free Self-Alignment Possible? Dyah Adila Changho Shin Yijing Zhang Frederic Sala MoMe 118 2 0 24 Feb 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint Qianli Ma Dongrui Liu Qian Chen Linfeng Zhang Jing Shao MoMe 189 0 0 24 Feb 2025
Uncertainty-Aware Fusion: An Ensemble Framework for Mitigating Hallucinations in Large Language Models Prasenjit Dey Srujana Merugu Sivaramakrishnan Kaveri HILM 34 0 0 22 Feb 2025
Activation Steering in Neural Theorem Provers Shashank Kirtania LLMSV 210 0 0 21 Feb 2025
Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans Masha Fedzechkina Eleonora Gualdoni Sinead Williamson Katherine Metcalf Skyler Seto B. Theobald 43 1 0 20 Feb 2025
Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection Yihao Xue Kristjan Greenewald Youssef Mroueh Baharan Mirzasoleiman HILM 62 1 0 20 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention Duy Nguyen Archiki Prasad Elias Stengel-Eskin Joey Tianyi Zhou LLMSV 110 0 0 18 Feb 2025
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models Daiki Chijiwa Taku Hasegawa Kyosuke Nishida Kuniko Saito Susumu Takeuchi 47 0 0 18 Feb 2025
Language Models Can Predict Their Own Behavior Dhananjay Ashok Jonathan May ReLM AI4TS LRM 63 0 0 18 Feb 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models Z. He Haiyan Zhao Yiran Qiao Fan Yang Ali Payani Jing Ma Mengnan Du LLMSV 74 2 0 17 Feb 2025
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis Xuben Wang Yan Hu Wenyu Du Reynold Cheng Benyou Wang Difan Zou 61 1 0 17 Feb 2025
Designing Role Vectors to Improve LLM Inference Behaviour Daniele Potertì Andrea Seveso Fabio Mercorio LLMSV 56 0 0 17 Feb 2025
Can ChatGPT Diagnose Alzheimer's Disease? Quoc Toan Nguyen Linh Le Xuan-The Tran T. Do Chin-Teng Lin LM&MA 278 0 0 10 Feb 2025
Task-driven Layerwise Additive Activation Intervention Hieu Trung Nguyen Bao Nguyen Binh Nguyen V. Nguyen KELM 56 0 0 10 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models Ala Nekouvaght Tak Amin Banayeeanzade Anahita Bolourani Mina Kian Robin Jia Jonathan Gratch 54 0 0 08 Feb 2025
Learning Task Representations from In-Context Learning Baturay Saglam Zhuoran Yang Dionysis Kalogerias Amin Karbasi 60 1 0 08 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations Guanxu Chen Dongrui Liu Tao Luo Jing Shao LRM MILM 67 1 0 07 Feb 2025
Enhancing Hallucination Detection through Noise Injection Litian Liu Reza Pourreza Sunny Panchal Apratim Bhattacharyya Yao Qin Roland Memisevic HILM 81 3 0 06 Feb 2025
Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models Jialiang Wu Yi Shen Sijia Liu Yi Tang Sen Song Xiaoyi Wang Longjun Cai 70 0 0 05 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge Daniel Tamayo Aitor Gonzalez-Agirre Javier Hernando Marta Villegas KELM 93 3 0 04 Feb 2025
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models Mingi Jung Saehuyng Lee Eunji Kim Sungroh Yoon 70 0 0 03 Feb 2025
Selective Response Strategies for GenAI Boaz Taitler Omer Ben-Porat 68 1 0 02 Feb 2025
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency Sazzad Hossain Touhidul Alam Seyam Avijit Chowdhury Munis Xamidov Rajib Ghose Abhijit Pathak 38 0 0 30 Jan 2025
On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models Naman Goel HILM 57 0 0 28 Jan 2025
Risk-Aware Distributional Intervention Policies for Language Models Bao Nguyen Binh Nguyen Duy Nguyen V. Nguyen 32 1 0 28 Jan 2025
Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators Dingkang Yang Dongling Xiao Jinjie Wei Mingcheng Li Zhaoyu Chen Ke Li Li Zhang HILM 94 3 0 28 Jan 2025
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach J. Yang Dapeng Chen Yajing Sun Rongjun Li Zhiyong Feng Wei Peng 51 5 0 19 Jan 2025
Reliable Text-to-SQL with Adaptive Abstention Kaiwen Chen Yueting Chen Xiaohui Yu Nick Koudas RALM 41 0 0 18 Jan 2025
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment Pegah Khayatan Mustafa Shukor Jayneel Parekh Matthieu Cord LLMSV 41 1 0 06 Jan 2025
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking Xiaoxue Cheng Junyi Li Wayne Xin Zhao Zhicheng Dou HILM LRM 38 4 0 02 Jan 2025