ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.05262
  4. Cited By
Locating and Editing Factual Associations in GPT
v1v2v3v4v5 (latest)

Locating and Editing Factual Associations in GPT

10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
    KELM
ArXiv (abs)PDFHTML

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 1,056 papers shown
Title
Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
  Language Models
Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models
Duanyu Feng
Yongfu Dai
Jimin Huang
Yifang Zhang
Qianqian Xie
Weiguang Han
Zhengyu Chen
Alejandro Lopez-Lira
Hao Wang
94
12
0
01 Oct 2023
From Language Modeling to Instruction Following: Understanding the
  Behavior Shift in LLMs after Instruction Tuning
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
97
33
0
30 Sep 2023
RelBERT: Embedding Relations with Language Models
RelBERT: Embedding Relations with Language Models
Asahi Ushio
Jose Camacho-Collados
Steven Schockaert
KELM
82
1
0
30 Sep 2023
Medical Foundation Models are Susceptible to Targeted Misinformation
  Attacks
Medical Foundation Models are Susceptible to Targeted Misinformation Attacks
T. Han
S. Nebelung
Firas Khader
Tian Wang
Gustav Mueller-Franzes
...
Jens Kleesiek
Christoph Haarburger
Keno K. Bressem
Jakob Nikolas Kather
Daniel Truhn
AAML
44
6
0
29 Sep 2023
KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language
  Models
KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models
Yiming Ju
Zheng Zhang
KELM
61
9
0
28 Sep 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
209
115
0
27 Sep 2023
MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical
  Question Answering
MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering
Takuya Higuchi
Shaochen Xu
Avamarie Brueggeman
Zheng Liu
Tianming Liu
Xiang Li
Ninghao Liu
RALM
102
14
0
27 Sep 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning
  Robustness
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
78
0
0
27 Sep 2023
Identifying and Mitigating Privacy Risks Stemming from Language Models:
  A Survey
Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey
Victoria Smith
Ali Shahin Shamsabadi
Carolyn Ashurst
Adrian Weller
PILM
110
27
0
27 Sep 2023
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
  Language Models
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
67
49
0
26 Sep 2023
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking
  Unrelated Questions
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Lorenzo Pacchiardi
A. J. Chan
Sören Mindermann
Ilan Moscovitz
Alexa Y. Pan
Y. Gal
Owain Evans
J. Brauner
LLMAGHILM
86
54
0
26 Sep 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
115
207
0
26 Sep 2023
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
182
159
0
25 Sep 2023
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems
Leonardo Ranaldi
Fabio Massimo Zanzotto
75
3
0
21 Sep 2023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Lukas Berglund
Meg Tong
Max Kaufmann
Mikita Balesni
Asa Cooper Stickland
Tomasz Korbak
Owain Evans
LRM
195
279
0
21 Sep 2023
Knowledge Sanitization of Large Language Models
Knowledge Sanitization of Large Language Models
Yoichi Ishibashi
Hidetoshi Shimodaira
KELM
129
25
0
21 Sep 2023
Rigorously Assessing Natural Language Explanations of Neurons
Rigorously Assessing Natural Language Explanations of Neurons
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
77
29
0
19 Sep 2023
Cross-Lingual Knowledge Editing in Large Language Models
Cross-Lingual Knowledge Editing in Large Language Models
Jiaan Wang
Yunlong Liang
Zengkui Sun
Yu Cao
Jiarong Xu
Fandong Meng
KELM
86
12
0
16 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
114
29
0
14 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
  Simplicity Bias in MLMs
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
145
74
0
13 Sep 2023
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Maximilian Li
Xander Davies
Max Nadeau
KELMMU
81
29
0
12 Sep 2023
Memory Injections: Correcting Multi-Hop Reasoning Failures during
  Inference in Transformer-Based Language Models
Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Daniel Grzenda
Nathaniel Hudson
André Bauer
Kyle Chard
Ian Foster
KELMLRM
96
17
0
11 Sep 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Neurons in Large Language Models: Dead, N-gram, Positional
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
166
56
0
09 Sep 2023
FIND: A Function Description Benchmark for Evaluating Interpretability
  Methods
FIND: A Function Description Benchmark for Evaluating Interpretability Methods
Sarah Schwettmann
Tamar Rott Shaham
Joanna Materzyñska
Neil Chowdhury
Shuang Li
Jacob Andreas
David Bau
Antonio Torralba
56
22
0
07 Sep 2023
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
  Language Models
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Yung-Sung Chuang
Yujia Xie
Hongyin Luo
Yoon Kim
James R. Glass
Pengcheng He
HILM
83
167
0
07 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large
  Language Models
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Wei Bi
Freda Shi
Shuming Shi
RALMLRMHILM
174
586
0
03 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
110
472
0
02 Sep 2023
Emergent Linear Representations in World Models of Self-Supervised
  Sequence Models
Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Neel Nanda
Andrew Lee
Martin Wattenberg
FAttMILM
125
186
0
02 Sep 2023
Why do universal adversarial attacks work on large language models?:
  Geometry might be the answer
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
Varshini Subhash
Anna Bialas
Weiwei Pan
Finale Doshi-Velez
AAML
91
11
0
01 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
  Tool for BLIP
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
88
23
0
27 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Jing Liu
286
31
0
27 Aug 2023
Unified Concept Editing in Diffusion Models
Unified Concept Editing in Diffusion Models
Rohit Gandikota
Hadas Orgad
Yonatan Belinkov
Joanna Materzyñska
David Bau
DiffM
110
192
0
25 Aug 2023
Journey to the Center of the Knowledge Neurons: Discoveries of
  Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
95
49
0
25 Aug 2023
Overcoming Generic Knowledge Loss with Selective Parameter Update
Overcoming Generic Knowledge Loss with Selective Parameter Update
Wenxuan Zhang
Paul Janson
Rahaf Aljundi
Mohamed Elhoseiny
KELMCLL
122
12
0
23 Aug 2023
Mode Combinability: Exploring Convex Combinations of Permutation Aligned
  Models
Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models
Adrián Csiszárik
M. Kiss
Péter Korösi-Szabó
Márton Muntag
Gergely Papp
D. Varga
MoMe
61
1
0
22 Aug 2023
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
Yeqi Gao
Zhao Song
Junze Yin
89
18
0
21 Aug 2023
Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs
Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs
Suhang Wu
Minlong Peng
Yue Chen
Jinsong Su
Mingming Sun
KELM
88
39
0
19 Aug 2023
Linearity of Relation Decoding in Transformer Language Models
Linearity of Relation Decoding in Transformer Language Models
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
92
100
0
17 Aug 2023
PMET: Precise Model Editing in a Transformer
PMET: Precise Model Editing in a Transformer
Xiaopeng Li
Shasha Li
Shangwen Wang
Jing Yang
Jun Ma
Jie Yu
KELM
165
135
0
17 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via
  Parameter-Efficient Module Operation
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
Hao Fei
KELMMU
96
30
0
16 Aug 2023
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language
  Models
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Peng Wang
Ningyu Zhang
Bo Tian
Zekun Xi
Yunzhi Yao
...
Shuyang Cheng
Kangwei Liu
Yuansheng Ni
Guozhou Zheng
Huajun Chen
KELM
89
57
0
14 Aug 2023
Explaining Relation Classification Models with Semantic Extents
Explaining Relation Classification Models with Semantic Extents
Lars Klöser
André Büsgen
Philipp Kohl
Bodo Kraft
Albert Zündorf
37
0
0
04 Aug 2023
Multimodal Neurons in Pretrained Text-Only Transformers
Multimodal Neurons in Pretrained Text-Only Transformers
Sarah Schwettmann
Neil Chowdhury
Samuel J. Klein
David Bau
Antonio Torralba
MILM
94
32
0
03 Aug 2023
Dual Governance: The intersection of centralized regulation and
  crowdsourced safety mechanisms for Generative AI
Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI
Avijit Ghosh
D. Lakshmi
51
3
0
02 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge
  using Vision-Language Pre-Training Model
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIPVLM
67
11
0
02 Aug 2023
The Hydra Effect: Emergent Self-repair in Language Model Computations
The Hydra Effect: Emergent Self-repair in Language Model Computations
Tom McGrath
Matthew Rahtz
János Kramár
Vladimir Mikulik
Shane Legg
MILMLRM
79
73
0
28 Jul 2023
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into
  Machine Learning Pipelines
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines
Matthew Barker
Emma Kallina
D. Ashok
Katherine M. Collins
Ashley Casovan
Adrian Weller
Ameet Talwalkar
Valerie Chen
Umang Bhatt
66
7
0
28 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALMOffRL
162
535
0
27 Jul 2023
Evaluating the Ripple Effects of Knowledge Editing in Language Models
Evaluating the Ripple Effects of Knowledge Editing in Language Models
Roi Cohen
Eden Biran
Ori Yoran
Amir Globerson
Mor Geva
KELM
110
180
0
24 Jul 2023
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot
  Classification
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
Neel Guha
Mayee F. Chen
Kush S. Bhatia
Azalia Mirhoseini
Frederic Sala
Christopher Ré
78
4
0
20 Jul 2023
Previous
123...1819202122
Next