Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08593
Cited By
v1
v2 (latest)
Revealing the Dark Secrets of BERT
21 August 2019
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revealing the Dark Secrets of BERT"
24 / 24 papers shown
Title
Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models
Yukin Zhang
Qi Dong
89
0
0
23 May 2025
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
250
1
0
23 Apr 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
220
12
0
23 Jan 2025
Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
Bohang Sun
Pietro Liò
ViT
AAML
139
1
0
02 Jan 2025
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
89
3
0
18 Feb 2024
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation
Yun-Wei Chu
Dong-Jun Han
Christopher G. Brinton
123
4
0
15 Jan 2024
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
M. Moradshahi
Hamid Palangi
M. Lam
P. Smolensky
Jianfeng Gao
122
16
0
25 Oct 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
109
1,069
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
117
1,148
0
23 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
279
2,326
0
02 May 2019
Linguistic Knowledge and Transferability of Contextual Representations
Nelson F. Liu
Matt Gardner
Yonatan Belinkov
Matthew E. Peters
Noah A. Smith
137
735
0
21 Mar 2019
Pay Less Attention with Lightweight and Dynamic Convolutions
Felix Wu
Angela Fan
Alexei Baevski
Yann N. Dauphin
Michael Auli
89
610
0
29 Jan 2019
Assessing BERT's Syntactic Abilities
Yoav Goldberg
75
496
0
16 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Gongbo Tang
Mathias Müller
Annette Rios Gonzales
Rico Sennrich
77
263
0
27 Aug 2018
Lessons from Natural Language Inference in the Clinical Domain
Alexey Romanov
Chaitanya P. Shivade
LM&MA
85
273
0
21 Aug 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,201
0
20 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
272
3,488
0
09 Mar 2018
The Importance of Being Recurrent for Modeling Hierarchical Structure
Ke M. Tran
Arianna Bisazza
Christof Monz
76
150
0
09 Mar 2018
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel Cer
Mona T. Diab
Eneko Agirre
I. Lopez-Gazpio
Lucia Specia
445
1,891
0
31 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
803
132,454
0
12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
524
4,497
0
18 Apr 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
316
8,177
0
16 Jun 2016
How transferable are features in deep neural networks?
J. Yosinski
Jeff Clune
Yoshua Bengio
Hod Lipson
OOD
238
8,353
0
06 Nov 2014
1