Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04284
Cited By
Analyzing the Structure of Attention in a Transformer Language Model
7 June 2019
Jesse Vig
Yonatan Belinkov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analyzing the Structure of Attention in a Transformer Language Model"
50 / 63 papers shown
Title
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
47
0
0
14 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition
Yifei Duan
Raphael Shang
Deng Liang
Yongqiang Cai
87
0
0
28 Feb 2025
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
94
3
0
24 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoE
AI4CE
66
1
0
13 Feb 2025
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Michael Toker
Ido Galil
Hadas Orgad
Rinon Gal
Yoad Tewel
Gal Chechik
Yonatan Belinkov
DiffM
54
2
0
12 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
86
0
0
02 Jan 2025
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
33
0
0
12 Oct 2024
Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective
Zhaotian Weng
Zijun Gao
Jerone Andrews
Jieyu Zhao
33
0
0
03 Jul 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
77
13
0
20 Jun 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective
Gianluigi Lopardo
F. Precioso
Damien Garreau
16
4
0
05 Feb 2024
On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series
Rita Kuznetsova
Alizée Pace
Manuel Burger
Hugo Yèche
Gunnar Rätsch
AI4TS
39
5
0
15 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
75
7
0
07 Nov 2023
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
Yifan Hou
Jiaoda Li
Yu Fei
Alessandro Stolfo
Wangchunshu Zhou
Guangtao Zeng
Antoine Bosselut
Mrinmaya Sachan
LRM
30
40
0
23 Oct 2023
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
27
0
0
17 Oct 2023
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning
Chong Li
Shaonan Wang
Yunhao Zhang
Jiajun Zhang
Chengqing Zong
38
4
0
16 Oct 2023
AI for the Generation and Testing of Ideas Towards an AI Supported Knowledge Development Environment
T. Selker
21
3
0
17 Jul 2023
Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization
Dongqi Pu
Yifa Wang
Vera Demberg
31
21
0
26 May 2023
End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Shaolei Zhang
Yang Feng
20
17
0
25 May 2023
State Spaces Aren't Enough: Machine Translation Needs Attention
Ali Vardasbi
Telmo Pires
Robin M. Schmidt
Stephan Peitz
19
9
0
25 Apr 2023
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
Hao Fei
Shengqiong Wu
Jingye Li
Bobo Li
Fei Li
Libo Qin
Meishan Zhang
M. Zhang
Tat-Seng Chua
26
76
0
13 Apr 2023
Attention-likelihood relationship in transformers
Valeria Ruscio
Valentino Maiorca
Fabrizio Silvestri
21
1
0
15 Mar 2023
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
32
3
0
22 Jan 2023
Dissociating language and thought in large language models
Kyle Mahowald
Anna A. Ivanova
I. Blank
Nancy Kanwisher
J. Tenenbaum
Evelina Fedorenko
ELM
ReLM
29
209
0
16 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
18
25
0
05 Jan 2023
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
40
44
0
20 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
24
24
0
07 Dec 2022
Explanation on Pretraining Bias of Finetuned Vision Transformer
Bumjin Park
Jaesik Choi
ViT
31
1
0
18 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Z. Yao
Xiaoxia Wu
Conglong Li
Connor Holmes
Minjia Zhang
Cheng-rong Li
Yuxiong He
28
11
0
17 Nov 2022
On the Explainability of Natural Language Processing Deep Models
Julia El Zini
M. Awad
27
82
0
13 Oct 2022
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure
Nuo Chen
Qiushi Sun
Renyu Zhu
Xiang Li
Xuesong Lu
Ming Gao
38
10
0
07 Oct 2022
Beware the Rationalization Trap! When Language Model Explainability Diverges from our Mental Models of Language
Rita Sevastjanova
Mennatallah El-Assady
LRM
35
9
0
14 Jul 2022
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
42
9
0
22 May 2022
Learning from Bootstrapping and Stepwise Reinforcement Reward: A Semi-Supervised Framework for Text Style Transfer
Zhengyuan Liu
Nancy F. Chen
19
1
0
19 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
22
8
0
08 May 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
33
9
0
06 Apr 2022
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Estelle Aflalo
Meng Du
Shao-Yen Tseng
Yongfei Liu
Chenfei Wu
Nan Duan
Vasudev Lal
30
45
0
30 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
23
49
0
08 Mar 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code
Yao Wan
Wei-Ye Zhao
Hongyu Zhang
Yulei Sui
Guandong Xu
Hairong Jin
35
105
0
14 Feb 2022
Exploiting a Zoo of Checkpoints for Unseen Tasks
Jiaji Huang
Qiang Qiu
Kenneth Ward Church
21
4
0
05 Nov 2021
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
17
44
0
20 Oct 2021
MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image Analysis
Hossein Aboutalebi
Maya Pavlova
Hayden Gunraj
M. Shafiee
A. Sabri
Amer Alaref
Alexander Wong
25
17
0
12 Oct 2021
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
35
127
0
13 Jun 2021
FedNLP: An interpretable NLP System to Decode Federal Reserve Communications
Jean Lee
Hoyoul Luis Youn
Nicholas Stevens
Josiah Poon
S. Han
16
10
0
11 Jun 2021
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
24
517
0
09 May 2021
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
14
417
0
18 Apr 2021
Supervising Model Attention with Human Explanations for Robust Natural Language Inference
Joe Stacey
Yonatan Belinkov
Marek Rei
30
45
0
16 Apr 2021
Pose Recognition with Cascade Transformers
Ke Li
Shijie Wang
Xiang Zhang
Yifan Xu
Weijian Xu
Z. Tu
ViT
37
209
0
14 Apr 2021
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
342
564
0
30 Mar 2021
1
2
Next