Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.05912
Cited By
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
12 May 2021
Ahmad Rashid
Vasileios Lioutas
Mehdi Rezagholizadeh
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation"
32 / 32 papers shown
Title
Accelerating the Low-Rank Decomposed Models
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
72
0
0
24 Jul 2024
Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization
Habib Hajimolahoseini
Walid Ahmed
Yang Liu
OffRL
MQ
48
7
0
07 Sep 2023
Annealing Knowledge Distillation
A. Jafari
Mehdi Rezagholizadeh
Pranav Sharma
A. Ghodsi
45
79
0
14 Apr 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
68
2,136
0
11 Jan 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
71
27
0
31 Dec 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
95
807
0
06 Apr 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
76
1,489
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
85
150
0
26 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
109
1,230
0
25 Feb 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
162
7,437
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
75
1,847
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
110
833
0
25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
467
24,160
0
26 Jul 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
53
229
0
10 Jul 2019
Robust Neural Machine Translation with Doubly Adversarial Inputs
Yong Cheng
Lu Jiang
Wolfgang Macherey
AAML
59
255
0
06 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
57
2,633
0
05 Jun 2019
PAWS: Paraphrase Adversaries from Word Scrambling
Yuan Zhang
Jason Baldridge
Luheng He
66
537
0
01 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
56
419
0
28 Mar 2019
Improved Knowledge Distillation via Teacher Assistant
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Ang Li
Nir Levine
Akihiro Matsukawa
H. Ghasemzadeh
89
1,073
0
09 Feb 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
122
1,226
0
04 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.2K
93,936
0
11 Oct 2018
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
225
579
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
744
7,080
0
20 Apr 2018
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
117
1,167
0
06 Mar 2018
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
143
1,779
0
10 Oct 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
519
129,831
0
12 Jun 2017
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
234
5,323
0
03 Nov 2016
Adversarial Training Methods for Semi-Supervised Text Classification
Takeru Miyato
Andrew M. Dai
Ian Goodfellow
GAN
67
1,056
0
25 May 2016
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
174
7,683
0
31 Aug 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
286
19,523
0
09 Mar 2015
Explaining and Harnessing Adversarial Examples
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
AAML
GAN
201
18,922
0
20 Dec 2014
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio
Nicholas Léonard
Aaron Courville
334
3,099
0
15 Aug 2013
1