Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04341
Cited By
What Does BERT Look At? An Analysis of BERT's Attention
11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Does BERT Look At? An Analysis of BERT's Attention"
50 / 885 papers shown
Title
Profile Consistency Identification for Open-domain Dialogue Agents
Haoyu Song
Yan Wang
Weinan Zhang
Zhengyu Zhao
Ting Liu
Xiaojiang Liu
24
29
0
21 Sep 2020
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Jonathan Pilault
Amine Elhattami
C. Pal
CLL
MoE
27
89
0
19 Sep 2020
The birth of Romanian BERT
Stefan Daniel Dumitrescu
Andrei-Marius Avram
S. Pyysalo
VLM
8
76
0
18 Sep 2020
Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation
Rajiv Movva
Jason Zhao
18
12
0
17 Sep 2020
Deep Learning Approaches for Extracting Adverse Events and Indications of Dietary Supplements from Clinical Text
Yadan Fan
Sicheng Zhou
Yifan Li
Rui Zhang
13
18
0
16 Sep 2020
Syntax Role for Neural Semantic Role Labeling
Z. Li
Hai Zhao
Shexia He
Jiaxun Cai
NAI
14
19
0
12 Sep 2020
Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability
Mayank Chhipa
Hrushikesh Mahesh Vazurkar
Abhijeet Kumar
Mridul Mishra
15
0
0
09 Sep 2020
Visually Analyzing Contextualized Embeddings
M. Berger
22
13
0
05 Sep 2020
Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
Joseph F DeRose
Jiayao Wang
M. Berger
17
83
0
03 Sep 2020
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Cong Guo
B. Hsueh
Jingwen Leng
Yuxian Qiu
Yue Guan
Zehuan Wang
Xiaoying Jia
Xipeng Li
M. Guo
Yuhao Zhu
35
1
0
29 Aug 2020
Language Models as Emotional Classifiers for Textual Conversations
Connor T. Heaton
David M. Schwartz
16
6
0
27 Aug 2020
Entity and Evidence Guided Relation Extraction for DocRED
Kevin Huang
Guangtao Wang
Tengyu Ma
Jing Huang
25
9
0
27 Aug 2020
Analysis and Evaluation of Language Models for Word Sense Disambiguation
Daniel Loureiro
Kiamehr Rezaee
Mohammad Taher Pilehvar
Jose Camacho-Collados
24
13
0
26 Aug 2020
Do Syntax Trees Help Pre-trained Transformers Extract Information?
Devendra Singh Sachan
Yuhao Zhang
Peng Qi
William L. Hamilton
6
78
0
20 Aug 2020
AutoKG: Constructing Virtual Knowledge Graphs from Unstructured Documents for Question Answering
Seunghak Yu
Tianxing He
James R. Glass
14
5
0
20 Aug 2020
On the Importance of Local Information in Transformer Based Models
Madhura Pande
Aakriti Budhraja
Preksha Nema
Pratyush Kumar
Mitesh M. Khapra
25
2
0
13 Aug 2020
On Commonsense Cues in BERT for Solving Commonsense Tasks
Leyang Cui
Sijie Cheng
Yu Wu
Yue Zhang
SSL
CML
LRM
34
14
0
10 Aug 2020
Better Fine-Tuning by Reducing Representational Collapse
Armen Aghajanyan
Akshat Shrivastava
Anchit Gupta
Naman Goyal
Luke Zettlemoyer
S. Gupta
AAML
47
209
0
06 Aug 2020
Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
Manoel Veríssimo dos Santos Neto
Ayrton Amaral
Nádia Félix F. da Silva
A. S. Soares
11
4
0
28 Jul 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,017
0
28 Jul 2020
IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?
Jaeyoul Shin
Taeuk Kim
Sang-goo Lee
20
1
0
24 Jul 2020
Add a SideNet to your MainNet
Adrien Morisot
14
0
0
14 Jul 2020
Can neural networks acquire a structural bias from raw linguistic data?
Alex Warstadt
Samuel R. Bowman
AI4CE
20
53
0
14 Jul 2020
BERT Learns (and Teaches) Chemistry
Josh Payne
Mario Srouji
Dian Ang Yap
V. Kosaraju
17
10
0
11 Jul 2020
Knowledge-Aware Language Model Pretraining
Corby Rosset
Chenyan Xiong
M. Phan
Xia Song
Paul N. Bennett
Saurabh Tiwary
KELM
35
79
0
29 Jun 2020
Rethinking Positional Encoding in Language Pre-training
Guolin Ke
Di He
Tie-Yan Liu
6
291
0
28 Jun 2020
BERTology Meets Biology: Interpreting Attention in Protein Language Models
Jesse Vig
Ali Madani
L. Varshney
Caiming Xiong
R. Socher
Nazneen Rajani
29
288
0
26 Jun 2020
Memory Transformer
Andrey Kravchenko
Yuri Kuratov
Anton Peganov
Grigory V. Sapunov
RALM
15
64
0
20 Jun 2020
Why Attentions May Not Be Interpretable?
Bing Bai
Jian Liang
Guanhua Zhang
Hao Li
Kun Bai
Fei Wang
FAtt
25
56
0
10 Jun 2020
O
(
n
)
O(n)
O
(
n
)
Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
6
78
0
08 Jun 2020
Generative Adversarial Phonology: Modeling unsupervised phonetic and phonological learning with neural networks
Gašper Beguš
GAN
6
25
0
06 Jun 2020
Understanding Self-Attention of Self-Supervised Audio Transformers
Shu-Wen Yang
Andy T. Liu
Hung-yi Lee
22
27
0
05 Jun 2020
A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
Jie Cai
Zhengzhou Zhu
Ping Nie
Qian Liu
AAML
21
7
0
02 Jun 2020
Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities
B. Goertzel
Andres Suarez Madrigal
Gino Yu
14
3
0
26 May 2020
Adversarial NLI for Factual Correctness in Text Summarisation Models
Mario Barrantes
Benedikt Herudek
Richard Wang
12
17
0
24 May 2020
The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs
Alexander Mehler
Bernhard Jussen
T. Geelhaar
Alexander Henlein
Giuseppe Abrami
Daniel Baumartz
Tolga Uslu
Wahed Hemati
27
8
0
21 May 2020
Table Search Using a Deep Contextualized Language Model
Zhiyu Zoey Chen
M. Trabelsi
J. Heflin
Yinan Xu
Brian D. Davison
LMTD
23
56
0
19 May 2020
Finding Experts in Transformer Models
Xavier Suau
Luca Zappella
N. Apostoloff
15
31
0
15 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
22
127
0
15 May 2020
A Mixture of
h
−
1
h-1
h
−
1
Heads is Better than
h
h
h
Heads
Hao Peng
Roy Schwartz
Dianqi Li
Noah A. Smith
MoE
27
32
0
13 May 2020
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
Zhuosheng Zhang
Hai Zhao
Rui Wang
18
62
0
13 May 2020
On the Robustness of Language Encoders against Grammatical Errors
Fan Yin
Quanyu Long
Tao Meng
Kai-Wei Chang
33
34
0
12 May 2020
The Cascade Transformer: an Application for Efficient Answer Sentence Selection
Luca Soldaini
Alessandro Moschitti
27
44
0
05 May 2020
What-if I ask you to explain: Explaining the effects of perturbations in procedural text
Dheeraj Rajagopal
Niket Tandon
Bhavana Dalvi
Peter Clarke
Eduard H. Hovy
25
14
0
04 May 2020
The Sensitivity of Language Models and Humans to Winograd Schema Perturbations
Mostafa Abdou
Vinit Ravishankar
Maria Barrett
Yonatan Belinkov
Desmond Elliott
Anders Søgaard
ReLM
LRM
62
34
0
04 May 2020
Similarity Analysis of Contextual Word Representation Models
John M. Wu
Yonatan Belinkov
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
James R. Glass
51
73
0
03 May 2020
Quantifying Attention Flow in Transformers
Samira Abnar
Willem H. Zuidema
60
776
0
02 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
32
66
0
02 May 2020
Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models
Bill Yuchen Lin
Seyeon Lee
Rahul Khanna
Xiang Ren
AIMat
8
154
0
02 May 2020
Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?
Yada Pruksachatkun
Jason Phang
Haokun Liu
Phu Mon Htut
Xiaoyi Zhang
Richard Yuanzhe Pang
Clara Vania
Katharina Kann
Samuel R. Bowman
CLL
LRM
11
194
0
01 May 2020
Previous
1
2
3
...
15
16
17
18
Next