Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Siavash Golkar
Alberto Bietti
Mariel Pettee
Michael Eickenberg
M. Cranmer
...
Ruben Ohana
Liam Parker
Bruno Régaldo-Saint Blancard
Kyunghyun Cho
Shirley Ho
78
2
0
30 May 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya Zhang
Yu Wang
Yanfeng Wang
82
5
0
30 May 2024
Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback
Jingwei Sun
Zhixu Du
Yiran Chen
KELM
62
2
0
30 May 2024
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors
Renzhi Wang
Piji Li
KELM
76
6
0
29 May 2024
Evaluating the External and Parametric Knowledge Fusion of Large Language Models
Hao Zhang
Yuyang Zhang
Xiaoguang Li
Wenxuan Shi
Haonan Xu
...
Yasheng Wang
Lifeng Shang
Qun Liu
Yong Liu
Ruiming Tang
KELM
97
5
0
29 May 2024
Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning
Renzhi Wang
Piji Li
58
4
0
28 May 2024
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
182
25
0
28 May 2024
Improved Generation of Adversarial Examples Against Safety-aligned LLMs
Qizhang Li
Yiwen Guo
Wangmeng Zuo
Hao Chen
AAML
SILM
89
7
0
28 May 2024
InversionView: A General-Purpose Method for Reading Information from Neural Activations
Xinting Huang
Madhur Panwar
Navin Goyal
Michael Hahn
104
5
0
27 May 2024
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Trishna Chakraborty
Erfan Shayegani
Zikui Cai
Nael B. Abu-Ghazaleh
M. Salman Asif
Yue Dong
Amit K. Roy-Chowdhury
Chengyu Song
88
18
0
27 May 2024
Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias
Xingbo He
Wenqi Fan
Ruobing Wang
Yili Wang
Ying Wang
Shirui Pan
Xin Wang
CML
72
2
0
27 May 2024
Perturbation-Restrained Sequential Model Editing
Junjie Ma
Hong Wang
Haoyang Xu
Zhen-Hua Ling
Jia-Chen Gu
KELM
178
11
0
27 May 2024
Large Scale Knowledge Washing
Yu Wang
Ruihan Wu
Zexue He
Xinyu Chen
Julian McAuley
MU
KELM
156
9
0
26 May 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
151
15
0
26 May 2024
Linearly Controlled Language Generation with Performative Guarantees
Emily Cheng
Marco Baroni
Carmen Amo Alonso
109
3
0
24 May 2024
Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top
Keyuan Cheng
Muhammad Asif Ali
Shu Yang
Gang Lin
Yuxuan Zhai
Haoyang Fei
Ke Xu
Lu Yu
Lijie Hu
Di Wang
KELM
122
11
0
24 May 2024
Sparse Matrix in Large Language Model Fine-tuning
Haoze He
Juncheng Billy Li
Xuan Jiang
Heather Miller
MoE
92
5
0
24 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
Alessandro Laio
Marco Baroni
221
16
0
24 May 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Boshi Wang
Xiang Yue
Yu-Chuan Su
Huan Sun
LRM
151
50
0
23 May 2024
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Bernal Jiménez Gutiérrez
Yiheng Shu
Yu Gu
Michihiro Yasunaga
Yu-Chuan Su
RALM
CLL
150
48
0
23 May 2024
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
Peng Wang
Zexi Li
Ningyu Zhang
Ziwen Xu
Yunzhi Yao
Yong Jiang
Pengjun Xie
Fei Huang
Huajun Chen
KELM
CLL
130
34
0
23 May 2024
Automatically Identifying Local and Global Circuits with Linear Computation Graphs
Xuyang Ge
Fukang Zhu
Wentao Shu
Junxuan Wang
Zhengfu He
Xipeng Qiu
107
10
0
22 May 2024
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Rheeya Uppaal
Apratim De
Yiting He
Yiquao Zhong
Junjie Hu
168
7
0
22 May 2024
Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts
Baolong Bi
Shenghua Liu
Lingrui Mei
Yiwei Wang
Pengliang Ji
Xueqi Cheng
KELM
85
35
0
19 May 2024
BadActs: A Universal Backdoor Defense in the Activation Space
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
94
7
0
18 May 2024
Learnable Privacy Neurons Localization in Language Models
Ruizhe Chen
Tianxiang Hu
Yang Feng
Zuo-Qiang Liu
94
16
0
16 May 2024
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing
Ruizhe Chen
Yichen Li
Zikai Xiao
Zuo-Qiang Liu
KELM
91
14
0
15 May 2024
Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models
Anna A. Ivanova
Aalok Sathe
Benjamin Lipkin
Unnathi Kumar
S. Radkani
...
Leshem Choshen
Roger Levy
Evelina Fedorenko
Josh Tenenbaum
Jacob Andreas
85
28
0
15 May 2024
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov
Georg Lange
Neel Nanda
79
41
0
14 May 2024
Can Language Models Explain Their Own Classification Behavior?
Dane Sherburn
Bilal Chughtai
Owain Evans
69
1
0
13 May 2024
Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning
Masane Fuchi
Tomohiro Takagi
DiffM
VLM
117
15
0
12 May 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Zorik Gekhman
G. Yona
Roee Aharoni
Matan Eyal
Amir Feder
Roi Reichart
Jonathan Herzig
174
137
0
09 May 2024
Learned feature representations are biased by complexity, learning order, position, and more
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Katherine Hermann
AI4CE
FaML
SSL
OOD
92
10
0
09 May 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao Song
81
0
0
09 May 2024
A Causal Explainable Guardrails for Large Language Models
Zhixuan Chu
Yan Wang
Longfei Li
Peng Kuang
Zhan Qin
Kui Ren
LLMSV
105
9
0
07 May 2024
How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability
Jorge García-Carrasco
Alejandro Maté
Juan Trujillo
67
10
0
07 May 2024
FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference
Runheng Liu
Xingchen Xiao
Heyan Huang
Zewen Chi
Zhijing Wu
RALM
KELM
83
0
0
07 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
124
15
0
06 May 2024
To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
George-Octavian Barbulescu
Peter Triantafillou
MU
114
23
0
06 May 2024
Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation
Kaize Shi
Xueyao Sun
Qing Li
Guandong Xu
112
13
0
06 May 2024
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning
Qizhou Chen
Taolin Zhang
Xiaofeng He
Dongyang Li
Chengyu Wang
Longtao Huang
Hui Xue
CLL
KELM
118
15
0
06 May 2024
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions
Ruizhe Li
Yanjun Gao
KELM
90
8
0
06 May 2024
What does the Knowledge Neuron Thesis Have to do with Knowledge?
Jingcheng Niu
Andrew Liu
Zining Zhu
Gerald Penn
115
38
0
03 May 2024
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
Junsang Yoon
Akshat Gupta
Gopala Anumanchipalli
53
9
0
01 May 2024
KAN: Kolmogorov-Arnold Networks
Ziming Liu
Yixuan Wang
Sachin Vaidya
Fabian Ruehle
James Halverson
Marin Soljacic
Thomas Y. Hou
Max Tegmark
330
602
0
30 Apr 2024
Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Haeun Yu
Pepa Atanasova
Isabelle Augenstein
KELM
76
4
0
29 Apr 2024
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
Jinghan Jia
Yihua Zhang
Yimeng Zhang
Jiancheng Liu
Bharat Runwal
James Diffenderfer
B. Kailkhura
Sijia Liu
MU
202
50
0
28 Apr 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
102
6
0
26 Apr 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLL
KELM
LRM
167
88
0
25 Apr 2024
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
VLM
CoGe
113
0
0
25 Apr 2024
Previous
1
2
3
...
12
13
14
...
20
21
22
Next