What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019

Kevin Clark

Urvashi Khandelwal

Omer Levy

Christopher D. Manning

MILM

ArXiv PDF HTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 885 papers shown

Title
Profile Consistency Identification for Open-domain Dialogue Agents Haoyu Song Yan Wang Weinan Zhang Zhengyu Zhao Ting Liu Xiaojiang Liu 24 29 0 21 Sep 2020
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data Jonathan Pilault Amine Elhattami C. Pal CLL MoE 27 89 0 19 Sep 2020
The birth of Romanian BERT Stefan Daniel Dumitrescu Andrei-Marius Avram S. Pyysalo VLM 8 76 0 18 Sep 2020
Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation Rajiv Movva Jason Zhao 18 12 0 17 Sep 2020
Deep Learning Approaches for Extracting Adverse Events and Indications of Dietary Supplements from Clinical Text Yadan Fan Sicheng Zhou Yifan Li Rui Zhang 13 18 0 16 Sep 2020
Syntax Role for Neural Semantic Role Labeling Z. Li Hai Zhao Shexia He Jiaxun Cai NAI 14 19 0 12 Sep 2020
Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability Mayank Chhipa Hrushikesh Mahesh Vazurkar Abhijeet Kumar Mridul Mishra 15 0 0 09 Sep 2020
Visually Analyzing Contextualized Embeddings M. Berger 22 13 0 05 Sep 2020
Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models Joseph F DeRose Jiayao Wang M. Berger 17 83 0 03 Sep 2020
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity Cong Guo B. Hsueh Jingwen Leng Yuxian Qiu Yue Guan Zehuan Wang Xiaoying Jia Xipeng Li M. Guo Yuhao Zhu 35 1 0 29 Aug 2020
Language Models as Emotional Classifiers for Textual Conversations Connor T. Heaton David M. Schwartz 16 6 0 27 Aug 2020
Entity and Evidence Guided Relation Extraction for DocRED Kevin Huang Guangtao Wang Tengyu Ma Jing Huang 25 9 0 27 Aug 2020
Analysis and Evaluation of Language Models for Word Sense Disambiguation Daniel Loureiro Kiamehr Rezaee Mohammad Taher Pilehvar Jose Camacho-Collados 24 13 0 26 Aug 2020
Do Syntax Trees Help Pre-trained Transformers Extract Information? Devendra Singh Sachan Yuhao Zhang Peng Qi William L. Hamilton 6 78 0 20 Aug 2020
AutoKG: Constructing Virtual Knowledge Graphs from Unstructured Documents for Question Answering Seunghak Yu Tianxing He James R. Glass 14 5 0 20 Aug 2020
On the Importance of Local Information in Transformer Based Models Madhura Pande Aakriti Budhraja Preksha Nema Pratyush Kumar Mitesh M. Khapra 25 2 0 13 Aug 2020
On Commonsense Cues in BERT for Solving Commonsense Tasks Leyang Cui Sijie Cheng Yu Wu Yue Zhang SSL CML LRM 34 14 0 10 Aug 2020
Better Fine-Tuning by Reducing Representational Collapse Armen Aghajanyan Akshat Shrivastava Anchit Gupta Naman Goyal Luke Zettlemoyer S. Gupta AAML 47 209 0 06 Aug 2020
Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets Manoel Veríssimo dos Santos Neto Ayrton Amaral Nádia Félix F. da Silva A. S. Soares 11 4 0 28 Jul 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 285 2,017 0 28 Jul 2020
IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize? Jaeyoul Shin Taeuk Kim Sang-goo Lee 20 1 0 24 Jul 2020
Add a SideNet to your MainNet Adrien Morisot 14 0 0 14 Jul 2020
Can neural networks acquire a structural bias from raw linguistic data? Alex Warstadt Samuel R. Bowman AI4CE 20 53 0 14 Jul 2020
BERT Learns (and Teaches) Chemistry Josh Payne Mario Srouji Dian Ang Yap V. Kosaraju 17 10 0 11 Jul 2020
Knowledge-Aware Language Model Pretraining Corby Rosset Chenyan Xiong M. Phan Xia Song Paul N. Bennett Saurabh Tiwary KELM 35 79 0 29 Jun 2020
Rethinking Positional Encoding in Language Pre-training Guolin Ke Di He Tie-Yan Liu 6 291 0 28 Jun 2020
BERTology Meets Biology: Interpreting Attention in Protein Language Models Jesse Vig Ali Madani L. Varshney Caiming Xiong R. Socher Nazneen Rajani 29 288 0 26 Jun 2020
Memory Transformer Andrey Kravchenko Yuri Kuratov Anton Peganov Grigory V. Sapunov RALM 15 64 0 20 Jun 2020
Why Attentions May Not Be Interpretable? Bing Bai Jian Liang Guanhua Zhang Hao Li Kun Bai Fei Wang FAtt 25 56 0 10 Jun 2020
$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers Chulhee Yun Yin-Wen Chang Srinadh Bhojanapalli A. S. Rawat Sashank J. Reddi Sanjiv Kumar 6 78 0 08 Jun 2020
Generative Adversarial Phonology: Modeling unsupervised phonetic and phonological learning with neural networks Gašper Beguš GAN 6 25 0 06 Jun 2020
Understanding Self-Attention of Self-Supervised Audio Transformers Shu-Wen Yang Andy T. Liu Hung-yi Lee 22 27 0 05 Jun 2020
A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension Jie Cai Zhengzhou Zhu Ping Nie Qian Liu AAML 21 7 0 02 Jun 2020
Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities B. Goertzel Andres Suarez Madrigal Gino Yu 14 3 0 26 May 2020
Adversarial NLI for Factual Correctness in Text Summarisation Models Mario Barrantes Benedikt Herudek Richard Wang 12 17 0 24 May 2020
The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs Alexander Mehler Bernhard Jussen T. Geelhaar Alexander Henlein Giuseppe Abrami Daniel Baumartz Tolga Uslu Wahed Hemati 27 8 0 21 May 2020
Table Search Using a Deep Contextualized Language Model Zhiyu Zoey Chen M. Trabelsi J. Heflin Yinan Xu Brian D. Davison LMTD 23 56 0 19 May 2020
Finding Experts in Transformer Models Xavier Suau Luca Zappella N. Apostoloff 15 31 0 15 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models Jize Cao Zhe Gan Yu Cheng Licheng Yu Yen-Chun Chen Jingjing Liu VLM 22 127 0 15 May 2020
A Mixture of $h-1$ Heads is Better than $h$ Heads Hao Peng Roy Schwartz Dianqi Li Noah A. Smith MoE 27 32 0 13 May 2020
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond Zhuosheng Zhang Hai Zhao Rui Wang 18 62 0 13 May 2020
On the Robustness of Language Encoders against Grammatical Errors Fan Yin Quanyu Long Tao Meng Kai-Wei Chang 33 34 0 12 May 2020
The Cascade Transformer: an Application for Efficient Answer Sentence Selection Luca Soldaini Alessandro Moschitti 27 44 0 05 May 2020
What-if I ask you to explain: Explaining the effects of perturbations in procedural text Dheeraj Rajagopal Niket Tandon Bhavana Dalvi Peter Clarke Eduard H. Hovy 25 14 0 04 May 2020
The Sensitivity of Language Models and Humans to Winograd Schema Perturbations Mostafa Abdou Vinit Ravishankar Maria Barrett Yonatan Belinkov Desmond Elliott Anders Søgaard ReLM LRM 62 34 0 04 May 2020
Similarity Analysis of Contextual Word Representation Models John M. Wu Yonatan Belinkov Hassan Sajjad Nadir Durrani Fahim Dalvi James R. Glass 51 73 0 03 May 2020
Quantifying Attention Flow in Transformers Samira Abnar Willem H. Zuidema 60 776 0 02 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering Qingqing Cao H. Trivedi A. Balasubramanian Niranjan Balasubramanian 32 66 0 02 May 2020
Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models Bill Yuchen Lin Seyeon Lee Rahul Khanna Xiang Ren AIMat 8 154 0 02 May 2020
Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work? Yada Pruksachatkun Jason Phang Haokun Liu Phu Mon Htut Xiaoyi Zhang Richard Yuanzhe Pang Clara Vania Katharina Kann Samuel R. Bowman CLL LRM 11 194 0 01 May 2020