v1v2 (latest)

Revealing the Dark Secrets of BERT

21 August 2019

Papers citing "Revealing the Dark Secrets of BERT"

24 / 24 papers shown

Title
Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models Yukin Zhang Qi Dong 89 0 0 23 May 2025
Do Large Language Models know who did what to whom? Joseph M. Denning Xiaohan Bryor Snefjella Idan A. Blank 250 1 0 23 Apr 2025
Parameter-Efficient Fine-Tuning for Foundation Models Dan Zhang Tao Feng Lilong Xue Yuandong Wang Yuxiao Dong J. Tang 220 12 0 23 Jan 2025
Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers Bohang Sun Pietro Liò ViT AAML 139 1 0 02 Jan 2025
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers Shuzhou Yuan Ercong Nie Bolei Ma Michael Farber 89 3 0 18 Feb 2024
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation Yun-Wei Chu Dong-Jun Han Christopher G. Brinton 123 4 0 15 Jan 2024
HUBERT Untangles BERT to Improve Transfer across NLP Tasks M. Moradshahi Hamid Palangi M. Lam P. Smolensky Jianfeng Gao 122 16 0 25 Oct 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 109 1,069 0 25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 117 1,148 0 23 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems Alex Jinpeng Wang Yada Pruksachatkun Nikita Nangia Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 279 2,326 0 02 May 2019
Linguistic Knowledge and Transferability of Contextual Representations Nelson F. Liu Matt Gardner Yonatan Belinkov Matthew E. Peters Noah A. Smith 137 735 0 21 Mar 2019
Pay Less Attention with Lightweight and Dynamic Convolutions Felix Wu Angela Fan Alexei Baevski Yann N. Dauphin Michael Auli 89 610 0 29 Jan 2019
Assessing BERT's Syntactic Abilities Yoav Goldberg 75 496 0 16 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,229 0 11 Oct 2018
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures Gongbo Tang Mathias Müller Annette Rios Gonzales Rico Sennrich 77 263 0 27 Aug 2018
Lessons from Natural Language Inference in the Clinical Domain Alexey Romanov Chaitanya P. Shivade LM&MA 85 273 0 21 Aug 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,201 0 20 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle Michael Carbin 272 3,488 0 09 Mar 2018
The Importance of Being Recurrent for Modeling Hierarchical Structure Ke M. Tran Arianna Bisazza Christof Monz 76 150 0 09 Mar 2018
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation Daniel Cer Mona T. Diab Eneko Agirre I. Lopez-Gazpio Lucia Specia 445 1,891 0 31 Jul 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 803 132,454 0 12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference Adina Williams Nikita Nangia Samuel R. Bowman 524 4,497 0 18 Apr 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 316 8,177 0 16 Jun 2016
How transferable are features in deep neural networks? J. Yosinski Jeff Clune Yoshua Bengio Hod Lipson OOD 238 8,353 0 06 Nov 2014