MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

25 February 2020

Papers citing "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers"

40 / 90 papers shown

Title
Small and Practical BERT Models for Sequence Labeling Henry Tsai Jason Riesa Melvin Johnson N. Arivazhagan Xin Li Amelia Archer VLM 22 121 0 31 Aug 2019
Patient Knowledge Distillation for BERT Model Compression S. Sun Yu Cheng Zhe Gan Jingjing Liu 98 833 0 25 Aug 2019
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models Iulia Turc Ming-Wei Chang Kenton Lee Kristina Toutanova 44 224 0 23 Aug 2019
Text Summarization with Pretrained Encoders Yang Liu Mirella Lapata MILM 385 1,439 0 22 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 368 24,160 0 26 Jul 2019
SpanBERT: Improving Pre-training by Representing and Predicting Spans Mandar Joshi Danqi Chen Yinhan Liu Daniel S. Weld Luke Zettlemoyer Omer Levy 104 1,953 0 24 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding Zhilin Yang Zihang Dai Yiming Yang J. Carbonell Ruslan Salakhutdinov Quoc V. Le AI4CE 160 8,386 0 19 Jun 2019
A Multiscale Visualization of Attention in the Transformer Model Jesse Vig ViT 50 571 0 12 Jun 2019
What Does BERT Look At? An Analysis of BERT's Attention Kevin Clark Urvashi Khandelwal Omer Levy Christopher D. Manning MILM 179 1,586 0 11 Jun 2019
Unified Language Model Pre-training for Natural Language Understanding and Generation Li Dong Nan Yang Wenhui Wang Furu Wei Xiaodong Liu Yu Wang Jianfeng Gao M. Zhou H. Hon ELM AI4CE 140 1,553 0 08 May 2019
MASS: Masked Sequence to Sequence Pre-training for Language Generation Kaitao Song Xu Tan Tao Qin Jianfeng Lu Tie-Yan Liu 79 962 0 07 May 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova Jimmy J. Lin 47 419 0 28 Mar 2019
Cloze-driven Pretraining of Self-attention Networks Alexei Baevski Sergey Edunov Yinhan Liu Luke Zettlemoyer Michael Auli 27 198 0 19 Mar 2019
Cross-lingual Language Model Pretraining Guillaume Lample Alexis Conneau 47 2,727 0 22 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 853 93,936 0 11 Oct 2018
XNLI: Evaluating Cross-lingual Sentence Representations Alexis Conneau Guillaume Lample Ruty Rinott Adina Williams Samuel R. Bowman Holger Schwenk Veselin Stoyanov ELM 46 1,366 0 13 Sep 2018
Bottom-Up Abstractive Summarization Sebastian Gehrmann Yuntian Deng Alexander M. Rush CVBM 78 688 0 31 Aug 2018
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization Shashi Narayan Shay B. Cohen Mirella Lapata AILaw 92 1,652 0 27 Aug 2018
Attention-Guided Answer Distillation for Machine Reading Comprehension Minghao Hu Yuxing Peng Furu Wei Zhen Huang Dongsheng Li Nan Yang M. Zhou FaML 48 75 0 23 Aug 2018
Know What You Don't Know: Unanswerable Questions for SQuAD Pranav Rajpurkar Robin Jia Percy Liang RALM ELM 162 2,818 0 11 Jun 2018
A Simple Method for Commonsense Reasoning Trieu H. Trinh Quoc V. Le LRM ReLM 75 432 0 07 Jun 2018
Neural Network Acceptability Judgments Alex Warstadt Amanpreet Singh Samuel R. Bowman 150 1,390 0 31 May 2018
Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia Xinya Du Claire Cardie KELM 37 161 0 15 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 572 7,080 0 20 Apr 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 88 11,520 0 15 Feb 2018
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation Daniel Cer Mona T. Diab Eneko Agirre I. Lopez-Gazpio Lucia Specia 149 1,870 0 31 Jul 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 341 129,831 0 12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference Adina Williams Nikita Nangia Samuel R. Bowman 363 4,444 0 18 Apr 2017
Get To The Point: Summarization with Pointer-Generator Networks A. See Peter J. Liu Christopher D. Manning 3DPC 170 4,003 0 14 Apr 2017
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer Sergey Zagoruyko N. Komodakis 92 2,561 0 12 Dec 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhiwen Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 788 6,768 0 26 Sep 2016
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 219 10,412 0 21 Jul 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 123 8,067 0 16 Jun 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.2K 192,638 0 10 Dec 2015
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 403 27,231 0 02 Dec 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books Yukun Zhu Ryan Kiros R. Zemel Ruslan Salakhutdinov R. Urtasun Antonio Torralba Sanja Fidler 89 2,529 0 22 Jun 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 169 19,448 0 09 Mar 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 632 149,474 0 22 Dec 2014
FitNets: Hints for Thin Deep Nets Adriana Romero Nicolas Ballas Samira Ebrahimi Kahou Antoine Chassang C. Gatta Yoshua Bengio FedML 214 3,862 0 19 Dec 2014
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 356 27,205 0 01 Sep 2014