Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.15910
Cited By
Characterizing Mechanisms for Factual Recall in Language Models
24 October 2023
Qinan Yu
Jack Merullo
Ellie Pavlick
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Characterizing Mechanisms for Factual Recall in Language Models"
21 / 21 papers shown
Title
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
49
1
0
14 Mar 2025
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
55
3
0
11 Nov 2024
Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs
Xin Zhou
Ping Nie
Yiwen Guo
Haojie Wei
Zhanqiu Zhang
Pasquale Minervini
Ruotian Ma
Tao Gui
Qi Zhang
Xuanjing Huang
MoE
44
0
0
20 Oct 2024
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Gintare Karolina Dziugaite
KELM
MU
38
5
0
16 Oct 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
ZhongXiang Sun
Xiaoxue Zang
Kai Zheng
Yang Song
Jun Xu
Xiao Zhang
Weijie Yu
Yang Song
Han Li
57
7
0
15 Oct 2024
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Yein Park
Chanwoong Yoon
Jungwoo Park
Donghyeon Lee
Minbyul Jeong
Jaewoo Kang
KELM
61
1
0
13 Oct 2024
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling
Ruochen Zhang
Qinan Yu
Matianyu Zang
Carsten Eickhoff
Ellie Pavlick
49
2
0
11 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
74
7
0
03 Oct 2024
PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead
Tao Tan
Yining Qian
Ang Lv
Hongzhan Lin
Songhao Wu
Yongbo Wang
Feng Wang
Jingtong Wu
Xin Lu
Rui Yan
22
1
0
29 Sep 2024
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Aaron Mueller
CML
33
10
0
05 Jul 2024
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
Yihuai Hong
Lei Yu
Shauli Ravfogel
Haiqin Yang
Mor Geva
KELM
MU
66
18
0
17 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després
Jinyue Feng
Zining Zhu
Frank Rudzicz
LRM
48
0
0
04 Jun 2024
Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
Martina G. Vilas
Federico Adolfi
David Poeppel
Gemma Roig
48
5
0
03 Jun 2024
UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation
Hanzhang Zhou
Zijian Feng
Zixiao Zhu
Junlang Qian
Kezhi Mao
44
6
0
31 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
40
114
0
22 Apr 2024
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
Ang Lv
Yuhan Chen
Kaiyi Zhang
Yulong Wang
Lifeng Liu
Ji-Rong Wen
Jian Xie
Rui Yan
KELM
37
16
0
28 Mar 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
115
0
28 Mar 2024
Observable Propagation: Uncovering Feature Vectors in Transformers
Jacob Dunefsky
Arman Cohan
38
2
0
26 Dec 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
191
261
0
28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
496
0
01 Nov 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
276
1,996
0
31 Dec 2020
1