Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.14680
Cited By
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
28 March 2022
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space"
50 / 272 papers shown
Title
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Clement Neo
Shay B. Cohen
Fazl Barez
44
4
0
23 Feb 2024
Understanding and Patching Compositional Reasoning in LLMs
Zhaoyi Li
Gangwei Jiang
Hong Xie
Linqi Song
Defu Lian
Ying Wei
LRM
56
20
0
22 Feb 2024
The Hidden Space of Transformer Language Adapters
Jesujoba Oluwadara Alabi
Marius Mosbach
Matan Eyal
Dietrich Klakow
Mor Geva
56
7
1
20 Feb 2024
When Only Time Will Tell: Interpreting How Transformers Process Local Ambiguities Through the Lens of Restart-Incrementality
Brielen Madureira
Patrick Kahardipraja
David Schlangen
39
2
0
20 Feb 2024
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Shahar Katz
Yonatan Belinkov
Mor Geva
Lior Wolf
63
10
1
20 Feb 2024
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
Zihan Qiu
Zeyu Huang
Youcheng Huang
Jie Fu
KELM
30
5
0
19 Feb 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
43
13
0
19 Feb 2024
Retrieval-Augmented Generation: Is Dense Passage Retrieval Retrieving?
Benjamin Z. Reichman
Larry Heck
24
2
0
16 Feb 2024
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Jundong Li
35
10
0
16 Feb 2024
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Chris Wendler
V. Veselovsky
Giovanni Monea
Robert West
56
97
0
16 Feb 2024
Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States
Hanyu Duan
Yi Yang
Kar Yan Tam
HILM
22
28
0
15 Feb 2024
Spectral Filters, Dark Signals, and Attention Sinks
Nicola Cancedda
64
16
0
14 Feb 2024
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
Bilal Chughtai
Alan Cooney
Neel Nanda
HILM
KELM
35
16
0
11 Feb 2024
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Aakriti Jain
Thomas Wiegand
Sebastian Lapuschkin
Wojciech Samek
33
25
0
08 Feb 2024
How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning
Zeping Yu
Sophia Ananiadou
48
7
0
05 Feb 2024
Neighboring Perturbations of Knowledge Editing on Large Language Models
Jun-Yu Ma
Zhen-Hua Ling
Ningyu Zhang
Jia-Chen Gu
KELM
31
5
0
31 Jan 2024
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao
Xianjun Yang
Tianyu Pang
Chao Du
Lei Li
Yu-Xiang Wang
William Yang Wang
34
54
0
30 Jan 2024
From Understanding to Utilization: A Survey on Explainability for Large Language Models
Haoyan Luo
Lucia Specia
56
20
0
23 Jan 2024
Universal Neurons in GPT2 Language Models
Wes Gurnee
Theo Horsley
Zifan Carl Guo
Tara Rezaei Kheirkhah
Qinyi Sun
Will Hathaway
Neel Nanda
Dimitris Bertsimas
MILM
102
37
0
22 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
34
87
0
11 Jan 2024
Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
Jia-Chen Gu
Haoyang Xu
Jun-Yu Ma
Pan Lu
Zhen-Hua Ling
Kai-Wei Chang
Nanyun Peng
KELM
33
35
0
09 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
77
96
0
03 Jan 2024
A Comprehensive Study of Knowledge Editing for Large Language Models
Ningyu Zhang
Yunzhi Yao
Bo Tian
Peng Wang
Shumin Deng
...
Lei Liang
Qing Cui
Xiao-Jun Zhu
Jun Zhou
Huajun Chen
KELM
47
76
0
02 Jan 2024
LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis
Jinwen He
Yujia Gong
Kai-xiang Chen
Zijin Lin
Chengán Wei
Yue Zhao
24
3
0
27 Dec 2023
Neuron-Level Knowledge Attribution in Large Language Models
Zeping Yu
Sophia Ananiadou
FAtt
KELM
29
7
0
19 Dec 2023
Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier
Sergey A. Saltykov
27
0
0
17 Dec 2023
Weight subcloning: direct initialization of transformers using larger pretrained ones
Mohammad Samragh
Mehrdad Farajtabar
Sachin Mehta
Raviteja Vemulapalli
Fartash Faghri
Devang Naik
Oncel Tuzel
Mohammad Rastegari
18
25
0
14 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
33
13
0
13 Dec 2023
Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language Models
Da Wu
Jing Yang
Kai Wang
LRM
18
5
0
06 Dec 2023
Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings
Andrea W Wen-Yi
David Mimno
30
14
0
29 Nov 2023
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation
Haoyi Wu
Kewei Tu
140
3
0
26 Nov 2023
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks
Ting-Yun Chang
Jesse Thomason
Robin Jia
17
14
0
15 Nov 2023
Assessing Knowledge Editing in Language Models via Relation Perspective
Yifan Wei
Xiaoyan Yu
Huanhuan Ma
Fangyu Lei
Yixuan Weng
Ran Song
Kang Liu
KELM
39
15
0
15 Nov 2023
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
Shiwen Ni
Dingwei Chen
Chengming Li
Xiping Hu
Ruifeng Xu
Min Yang
KELM
MoMe
39
7
0
14 Nov 2023
In-context Learning and Gradient Descent Revisited
Gilad Deutch
Nadav Magar
Tomer Bar Natan
Guy Dar
28
9
0
13 Nov 2023
The Linear Representation Hypothesis and the Geometry of Large Language Models
Kiho Park
Yo Joong Choe
Victor Veitch
LLMSV
MILM
31
140
0
07 Nov 2023
Training Dynamics of Contextual N-Grams in Language Models
Lucia Quirke
Lovis Heindrich
Wes Gurnee
Neel Nanda
18
4
0
01 Nov 2023
Analyzing Vision Transformers for Image Classification in Class Embedding Space
Martina G. Vilas
Timothy Schaumlöffel
Gemma Roig
ViT
21
23
0
29 Oct 2023
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Wenlin Yao
KELM
74
133
0
24 Oct 2023
Unnatural language processing: How do language models handle machine-generated prompts?
Corentin Kervadec
Francesca Franzon
Marco Baroni
23
5
0
24 Oct 2023
Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks
Sunit Bhattacharya
Ondrej Bojar
27
8
0
24 Oct 2023
Function Vectors in Large Language Models
Eric Todd
Millicent Li
Arnab Sen Sharma
Aaron Mueller
Byron C. Wallace
David Bau
14
100
0
23 Oct 2023
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
28
16
0
19 Oct 2023
Investigating semantic subspaces of Transformer sentence embeddings through linear structural probing
Dmitry Nikolaev
Sebastian Padó
46
5
0
18 Oct 2023
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models
Aviv Slobodkin
Omer Goldman
Avi Caciularu
Ido Dagan
Shauli Ravfogel
HILM
LRM
47
24
0
18 Oct 2023
Unlocking Emergent Modularity in Large Language Models
Zihan Qiu
Zeyu Huang
Jie Fu
28
8
0
17 Oct 2023
Untying the Reversal Curse via Bidirectional Language Model Editing
Jun-Yu Ma
Jia-Chen Gu
Zhen-Hua Ling
Quan Liu
Cong Liu
KELM
79
36
0
16 Oct 2023
Self-Detoxifying Language Models via Toxification Reversal
Chak Tou Leong
Yi Cheng
Jiashuo Wang
Jian Wang
Wenjie Li
MU
24
30
0
14 Oct 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Cunxiang Wang
Xiaoze Liu
Yuanhao Yue
Xiangru Tang
Tianhang Zhang
...
Linyi Yang
Jindong Wang
Xing Xie
Zheng-Wei Zhang
Yue Zhang
HILM
KELM
51
184
0
11 Oct 2023
Probing Large Language Models from A Human Behavioral Perspective
Xintong Wang
Xiaoyu Li
Xingshan Li
Chris Biemann
51
5
0
08 Oct 2023
Previous
1
2
3
4
5
6
Next