ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.05262
  4. Cited By
Locating and Editing Factual Associations in GPT
v1v2v3v4v5 (latest)

Locating and Editing Factual Associations in GPT

10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
    KELM
ArXiv (abs)PDFHTML

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 1,056 papers shown
Title
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Tyler A. Chang
Benjamin Bergen
149
0
0
21 Apr 2025
Functional Abstraction of Knowledge Recall in Large Language Models
Functional Abstraction of Knowledge Recall in Large Language Models
Zijian Wang
Chang Xu
KELM
71
1
0
20 Apr 2025
MIB: A Mechanistic Interpretability Benchmark
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller
Atticus Geiger
Sarah Wiegreffe
Dana Arad
Iván Arcuschin
...
Alessandro Stolfo
Martin Tutek
Amir Zur
David Bau
Yonatan Belinkov
133
2
0
17 Apr 2025
GRAIL: Gradient-Based Adaptive Unlearning for Privacy and Copyright in LLMs
GRAIL: Gradient-Based Adaptive Unlearning for Privacy and Copyright in LLMs
Kun-Woo Kim
Ji-Hoon Park
Ju-Min Han
Seong-Whan Lee
MUPILM
118
1
0
17 Apr 2025
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation
Linda He
Jue Wang
Maurice Weber
Shang Zhu
Ben Athiwaratkun
Ce Zhang
SyDaLRM
83
1
0
17 Apr 2025
SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation
SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation
Saransh Agrawal
Kuan-Hao Huang
MUKELM
108
0
0
17 Apr 2025
Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning
Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning
Saif Punjwani
Larry Heck
LRM
93
0
0
14 Apr 2025
Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
V. Veselovsky
Berke Argin
Benedikt Stroebl
Chris Wendler
Robert West
James Evans
Thomas L. Griffiths
Arvind Narayanan
117
1
0
14 Apr 2025
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Abhinav Joshi
A. Ahmad
Divyaksh Shukla
Ashutosh Modi
ReLMLRM
92
0
0
14 Apr 2025
Can We Edit LLMs for Long-Tail Biomedical Knowledge?
Can We Edit LLMs for Long-Tail Biomedical Knowledge?
Xinhao Yi
Jake Lever
Kevin Bryson
Zaiqiao Meng
KELM
82
0
0
14 Apr 2025
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective
Qi Liu
Jiaxin Mao
Ji-Rong Wen
LRM
79
1
0
10 Apr 2025
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Yixin Cao
Jiahao Ying
Yansen Wang
Xipeng Qiu
Xuanjing Huang
Yugang Jiang
ELM
107
2
0
10 Apr 2025
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
Hanqi Xiao
Yi-Lin Sung
Elias Stengel-Eskin
Joey Tianyi Zhou
MQ
104
0
0
10 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
Dang Nguyen
Chenhao Tan
81
1
0
07 Apr 2025
Steering off Course: Reliability Challenges in Steering Language Models
Steering off Course: Reliability Challenges in Steering Language Models
Patrick Queiroz Da Silva
Hari Sethuraman
Dheeraj Rajagopal
Hannaneh Hajishirzi
Sachin Kumar
LLMSV
102
2
0
06 Apr 2025
Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models
Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models
Mingyang Wang
Heike Adel
Lukas Lange
Yihong Liu
Ercong Nie
Jannik Strötgen
Hinrich Schütze
HILM
117
5
0
05 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
132
1
0
05 Apr 2025
Page Classification for Print Imaging Pipeline
Page Classification for Print Imaging Pipeline
Shaoyuan Xu
Cheng Lu
Mark Shaw
Peter Bauer
J. Allebach
VLM
100
1
0
03 Apr 2025
Noiser: Bounded Input Perturbations for Attributing Large Language Models
Noiser: Bounded Input Perturbations for Attributing Large Language Models
Mohammad Reza Ghasemi Madani
Aryo Pradipta Gema
Gabriele Sarti
Yu Zhao
Pasquale Minervini
Andrea Passerini
AAML
121
1
0
03 Apr 2025
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du
Weikai Li
Min Cai
Karim Saraipour
Zimin Zhang
Himabindu Lakkaraju
Yizhou Sun
Shichang Zhang
KELM
73
1
0
03 Apr 2025
InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation
InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation
Bowen Cao
Deng Cai
W. Lam
CLL
101
1
0
02 Apr 2025
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
Boshi Wang
Huan Sun
98
5
0
02 Apr 2025
Forward Learning with Differential Privacy
Forward Learning with Differential Privacy
Mingqian Feng
Zeliang Zhang
Jinyang Jiang
Yijie Peng
Chenliang Xu
98
0
0
01 Apr 2025
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Aleksandra Bakalova
Yana Veitsman
Xinting Huang
Michael Hahn
78
2
0
31 Mar 2025
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Sudong Wang
Yize Zhang
Yao Zhu
Jianing Li
Zizhe Wang
Yi Liu
Xiangyang Ji
356
1
0
31 Mar 2025
Leaking LoRa: An Evaluation of Password Leaks and Knowledge Storage in Large Language Models
Leaking LoRa: An Evaluation of Password Leaks and Knowledge Storage in Large Language Models
Ryan Marinelli
Magnus Eckhoff
PILM
89
0
0
29 Mar 2025
Effective Skill Unlearning through Intervention and Abstention
Effective Skill Unlearning through Intervention and Abstention
Yongce Li
Chung-En Sun
Tsui-Wei Weng
MU
455
1
0
27 Mar 2025
How do language models learn facts? Dynamics, curricula and hallucinations
How do language models learn facts? Dynamics, curricula and hallucinations
Nicolas Zucchet
J. Bornschein
Stephanie C. Y. Chan
Andrew Kyle Lampinen
Razvan Pascanu
Soham De
KELMHILMLRM
140
7
1
27 Mar 2025
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Chenxi Wang
Jizhan Fang
Xiang Chen
Bozhong Tian
Ziwen Xu
Hong Chen
N. Zhang
KELM
143
0
0
26 Mar 2025
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay Kulkarni
Ge Yan
Chung-En Sun
Tuomas P. Oikarinen
Tsui-Wei Weng
77
0
0
25 Mar 2025
A Study into Investigating Temporal Robustness of LLMs
A Study into Investigating Temporal Robustness of LLMs
Jonas Wallat
Abdelrahman Abdallah
Adam Jatowt
Avishek Anand
76
3
0
21 Mar 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Yunzhi Yao
Jizhan Fang
Jia-Chen Gu
N. Zhang
Shumin Deng
Ningyu Zhang
Nanyun Peng
KELM
117
3
0
20 Mar 2025
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
Ying Shen
Lifu Huang
104
2
0
20 Mar 2025
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Baolong Bi
Shenghua Liu
Yansen Wang
Yilong Xu
Sihang Li
Lingrui Mei
Xueqi Cheng
KELM
117
8
0
20 Mar 2025
Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification
Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification
Shichen Li
Zhongqing Wang
Zheyu Zhao
Yue Zhang
Peifeng Li
KELM
68
1
0
19 Mar 2025
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
Murong Yue
Ziyu Yao
SILMAAML
101
0
0
18 Mar 2025
SuperBPE: Space Travel for Language Models
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
168
10
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
154
4
0
17 Mar 2025
TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
Philip Quirke
Clement Neo
Abir Harrasse
Dhruv Nathawani
Luke Marks
Amir Abdullah
93
0
0
17 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
165
1
0
14 Mar 2025
Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing
Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing
Bhiman Kumar Baghel
Scott M. Jordan
Zheyuan Ryan Shi
Xiang Lorraine Li
KELM
107
0
0
14 Mar 2025
Taming Knowledge Conflicts in Language Models
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
93
2
0
14 Mar 2025
Safe Vision-Language Models via Unsafe Weights Manipulation
Safe Vision-Language Models via Unsafe Weights Manipulation
Moreno DÍncà
E. Peruzzo
Xingqian Xu
Humphrey Shi
N. Sebe
Massimiliano Mancini
MU
116
0
0
14 Mar 2025
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan
Fei Kong
Hao-Ran Cheng
James Diffenderfer
B. Kailkhura
Lichao Sun
Xiaofeng Zhu
Xiaoshuang Shi
Kaidi Xu
486
4
0
13 Mar 2025
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
Jiuding Sun
Jing Huang
Sidharth Baskaran
Karel DÓosterlinck
Christopher Potts
Michael Sklar
Atticus Geiger
AI4CE
125
2
0
13 Mar 2025
C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion
Lijie Hu
Junchi Liao
Weimin Lyu
Shaopeng Fu
Tianhao Huang
Shu Yang
Guimin Hu
Di Wang
AAML
126
0
0
12 Mar 2025
ACE: Concept Editing in Diffusion Models without Performance Degradation
Ruipeng Wang
Sihang Li
Jiaqi Li
Hao Chen
Jie Shi
Kaidi Wang
Xinze Wang
DiffM
112
2
0
11 Mar 2025
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
Xin Xu
Wei Xu
N. Zhang
Julian McAuley
KELM
137
1
0
11 Mar 2025
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Tianhe Lin
Jian Xie
Siyu Yuan
Deqing Yang
ReLMLRM
153
3
0
10 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
146
1
0
08 Mar 2025
Previous
12345...202122
Next