ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.03341
  4. Cited By
Inference-Time Intervention: Eliciting Truthful Answers from a Language
  Model

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

6 June 2023
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
    KELM
    HILM
ArXivPDFHTML

Papers citing "Inference-Time Intervention: Eliciting Truthful Answers from a Language Model"

50 / 411 papers shown
Title
TAIA: Large Language Models are Out-of-Distribution Data Learners
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya Zhang
Yu Wang
Yanfeng Wang
29
3
0
30 May 2024
Kernel Language Entropy: Fine-grained Uncertainty Quantification for
  LLMs from Semantic Similarities
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
Alexander Nikitin
Jannik Kossen
Yarin Gal
Pekka Marttinen
UQCV
53
25
0
30 May 2024
AI Risk Management Should Incorporate Both Safety and Security
AI Risk Management Should Incorporate Both Safety and Security
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
...
Chaowei Xiao
Bo-wen Li
Dawn Song
Peter Henderson
Prateek Mittal
AAML
51
11
0
29 May 2024
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control
Huanshuo Liu
Hao Zhang
Zhijiang Guo
Kuicai Dong
Xiangyang Li
Yi Quan Lee
Cong Zhang
Yong-jin Liu
3DV
45
1
0
29 May 2024
Calibrating Reasoning in Language Models with Internal Consistency
Calibrating Reasoning in Language Models with Internal Consistency
Zhihui Xie
Jizhou Guo
Tong Yu
Shuai Li
LRM
51
9
0
29 May 2024
Knowledge Circuits in Pretrained Transformers
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
74
20
0
28 May 2024
Personalized Steering of Large Language Models: Versatile Steering
  Vectors Through Bi-directional Preference Optimization
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Yuanpu Cao
Tianrong Zhang
Bochuan Cao
Ziyi Yin
Lu Lin
Fenglong Ma
Jinghui Chen
LLMSV
37
20
0
28 May 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
71
8
0
26 May 2024
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning
  Attacks
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks
Chak Tou Leong
Yi Cheng
Kaishuai Xu
Jian Wang
Hanlin Wang
Wenjie Li
AAML
51
18
0
25 May 2024
Linearly Controlled Language Generation with Performative Guarantees
Linearly Controlled Language Generation with Performative Guarantees
Emily Cheng
Marco Baroni
Carmen Amo Alonso
48
3
0
24 May 2024
Implicit In-context Learning
Implicit In-context Learning
Zhuowei Li
Zihao Xu
Ligong Han
Yunhe Gao
Song Wen
Di Liu
Hao Wang
Dimitris N. Metaxas
38
2
0
23 May 2024
Automatically Identifying Local and Global Circuits with Linear
  Computation Graphs
Automatically Identifying Local and Global Circuits with Linear Computation Graphs
Xuyang Ge
Fukang Zhu
Wentao Shu
Junxuan Wang
Zhengfu He
Xipeng Qiu
27
10
0
22 May 2024
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via
  Alignment Tax Reduction
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction
Tingchen Fu
Deng Cai
Lemao Liu
Shuming Shi
Rui Yan
MoMe
62
13
0
22 May 2024
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Rheeya Uppaal
Apratim De
Yiting He
Yiquao Zhong
Junjie Hu
41
9
0
22 May 2024
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Minbyul Jeong
Hyeon Hwang
Chanwoong Yoon
Taewhoo Lee
Jaewoo Kang
MedIm
HILM
LM&MA
48
12
0
21 May 2024
Spectral Editing of Activations for Large Language Model Alignment
Spectral Editing of Activations for Large Language Model Alignment
Yifu Qiu
Zheng Zhao
Yftah Ziser
Anna Korhonen
Edoardo Ponti
Shay B. Cohen
KELM
LLMSV
28
16
0
15 May 2024
Towards Principled Evaluations of Sparse Autoencoders for
  Interpretability and Control
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov
Georg Lange
Neel Nanda
26
33
0
14 May 2024
Can Language Models Explain Their Own Classification Behavior?
Can Language Models Explain Their Own Classification Behavior?
Dane Sherburn
Bilal Chughtai
Owain Evans
47
1
0
13 May 2024
An Assessment of Model-On-Model Deception
An Assessment of Model-On-Model Deception
Julius Heitkoetter
Michael Gerovitch
Laker Newhouse
42
3
0
10 May 2024
Mitigating Hallucinations in Large Language Models via
  Self-Refinement-Enhanced Knowledge Retrieval
Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval
Mengjia Niu
Hao Li
Jie Shi
Hamed Haddadi
Fan Mo
HILM
51
10
0
10 May 2024
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Joshua Clymer
Caden Juang
Severin Field
CVBM
34
2
0
08 May 2024
A Causal Explainable Guardrails for Large Language Models
A Causal Explainable Guardrails for Large Language Models
Zhixuan Chu
Yan Wang
Longfei Li
Peng Kuang
Zhan Qin
Kui Ren
LLMSV
52
7
0
07 May 2024
FLAME: Factuality-Aware Alignment for Large Language Models
FLAME: Factuality-Aware Alignment for Large Language Models
Sheng-Chieh Lin
Luyu Gao
Barlas Oğuz
Wenhan Xiong
Jimmy Lin
Wen-tau Yih
Xilun Chen
HILM
41
16
0
02 May 2024
Enhanced Language Model Truthfulness with Learnable Intervention and
  Uncertainty Expression
Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression
Farima Fatahi Bayat
Xin Liu
H. V. Jagadish
Lu Wang
HILM
KELM
33
3
0
01 May 2024
More RLHF, More Trust? On The Impact of Human Preference Alignment On
  Language Model Trustworthiness
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
Aaron Jiaxun Li
Satyapriya Krishna
Himabindu Lakkaraju
48
3
0
29 Apr 2024
Truth-value judgment in language models: belief directions are context
  sensitive
Truth-value judgment in language models: belief directions are context sensitive
Stefan F. Schouten
Peter Bloem
Ilia Markov
Piek Vossen
KELM
71
1
0
29 Apr 2024
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of
  Theories, Detection Methods, and Opportunities
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Xiaomin Yu
Yezhaohui Wang
Yanfang Chen
Zhen Tao
Dinghao Xi
Shichao Song
Pengnian Qi
Zhiyu Li
69
8
0
25 Apr 2024
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised
  Approach
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
Linyu Liu
Yu Pan
Xiaocheng Li
Guanting Chen
38
25
0
24 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
84
47
0
23 Apr 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
40
117
0
22 Apr 2024
DESTEIN: Navigating Detoxification of Language Models via Universal
  Steering Pairs and Head-wise Activation Fusion
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Yu Li
Zhihua Wei
Han Jiang
Chuanyang Gong
LLMSV
29
2
0
16 Apr 2024
Enhancing Confidence Expression in Large Language Models Through
  Learning from Past Experience
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience
Haixia Han
Tingyun Li
Shisong Chen
Jie Shi
Chengyu Du
Yanghua Xiao
Jiaqing Liang
Xin Lin
53
6
0
16 Apr 2024
Constructing Benchmarks and Interventions for Combating Hallucinations
  in LLMs
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
54
11
0
15 Apr 2024
Entropy Guided Extrapolative Decoding to Improve Factuality in Large
  Language Models
Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
Souvik Das
Lifeng Jin
Linfeng Song
Haitao Mi
Baolin Peng
Dong Yu
HILM
40
2
0
14 Apr 2024
Continuous Language Model Interpolation for Dynamic and Controllable
  Text Generation
Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
Sara Kangaslahti
David Alvarez-Melis
KELM
34
0
0
10 Apr 2024
Exploring Concept Depth: How Large Language Models Acquire Knowledge at
  Different Layers?
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Mingyu Jin
Qinkai Yu
Jingyuan Huang
Qingcheng Zeng
Zhenting Wang
...
Yanda Meng
Kaize Ding
Fan Yang
Mengnan Du
Yongfeng Zhang
58
0
0
10 Apr 2024
The Hallucinations Leaderboard -- An Open Effort to Measure
  Hallucinations in Large Language Models
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
Giwon Hong
Aryo Pradipta Gema
Rohit Saxena
Xiaotang Du
Ping Nie
...
Laura Perez-Beltrachini
Max Ryabinin
Xuanli He
Clémentine Fourrier
Pasquale Minervini
LRM
HILM
38
11
0
08 Apr 2024
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State
  Transition Dynamics
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Derui Zhu
Dingfan Chen
Qing Li
Zongxiong Chen
Lei Ma
Jens Grossklags
Mario Fritz
HILM
35
10
0
06 Apr 2024
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
93
9
0
05 Apr 2024
ReFT: Representation Finetuning for Language Models
ReFT: Representation Finetuning for Language Models
Zhengxuan Wu
Aryaman Arora
Zheng Wang
Atticus Geiger
Daniel Jurafsky
Christopher D. Manning
Christopher Potts
OffRL
38
58
0
04 Apr 2024
Test-Time Model Adaptation with Only Forward Passes
Test-Time Model Adaptation with Only Forward Passes
Shuaicheng Niu
Chunyan Miao
Guohao Chen
Pengcheng Wu
Peilin Zhao
TTA
48
19
0
02 Apr 2024
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
Junghyun Koo
Gordon Wichern
François Germain
Sameer Khurana
Jonathan Le Roux
34
3
0
02 Apr 2024
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge
  Editing Benchmark
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge Editing Benchmark
Baolong Bi
Shenghua Liu
Yiwei Wang
Lingrui Mei
Xueqi Cheng
KELM
49
10
0
30 Mar 2024
On Large Language Models' Hallucination with Regard to Known Facts
On Large Language Models' Hallucination with Regard to Known Facts
Che Jiang
Biqing Qi
Xiangyu Hong
Dayuan Fu
Yang Cheng
Fandong Meng
Mo Yu
Bowen Zhou
Jie Zhou
HILM
LRM
39
16
0
29 Mar 2024
Localizing Paragraph Memorization in Language Models
Localizing Paragraph Memorization in Language Models
Niklas Stoehr
Mitchell Gordon
Chiyuan Zhang
Owen Lewis
MU
38
13
0
28 Mar 2024
Non-Linear Inference Time Intervention: Improving LLM Truthfulness
Non-Linear Inference Time Intervention: Improving LLM Truthfulness
Jakub Hoscilowicz
Adam Wiacek
Jan Chojnacki
Adam Cieślak
Leszek Michon
Vitalii Urbanevych
Artur Janicki
KELM
38
2
0
27 Mar 2024
Mechanistic Understanding and Mitigation of Language Model Non-Factual
  Hallucinations
Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
Lei Yu
Meng Cao
Jackie Chi Kit Cheung
Yue Dong
HILM
33
9
0
27 Mar 2024
Can multiple-choice questions really be useful in detecting the
  abilities of LLMs?
Can multiple-choice questions really be useful in detecting the abilities of LLMs?
Wangyue Li
Liangzhi Li
Tong Xiang
Xiao Liu
Wei Deng
Noa Garcia
ELM
47
28
0
26 Mar 2024
Emergent World Models and Latent Variable Estimation in Chess-Playing
  Language Models
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
Adam Karvonen
40
19
0
21 Mar 2024
Think Twice Before Trusting: Self-Detection for Large Language Models
  through Comprehensive Answer Reflection
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
Moxin Li
Wenjie Wang
Fuli Feng
Fengbin Zhu
Qifan Wang
Tat-Seng Chua
HILM
LRM
46
15
0
15 Mar 2024
Previous
123456789
Next