ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.05912
  4. Cited By
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

12 May 2021
Ahmad Rashid
Vasileios Lioutas
Mehdi Rezagholizadeh
    AAML
ArXivPDFHTML

Papers citing "MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation"

32 / 32 papers shown
Title
Accelerating the Low-Rank Decomposed Models
Accelerating the Low-Rank Decomposed Models
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
72
0
0
24 Jul 2024
Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization
Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization
Habib Hajimolahoseini
Walid Ahmed
Yang Liu
OffRL
MQ
48
7
0
07 Sep 2023
Annealing Knowledge Distillation
Annealing Knowledge Distillation
A. Jafari
Mehdi Rezagholizadeh
Pranav Sharma
A. Ghodsi
45
79
0
14 Apr 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
68
2,136
0
11 Jan 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
71
27
0
31 Dec 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
95
807
0
06 Apr 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
76
1,489
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
85
150
0
26 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
109
1,230
0
25 Feb 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
162
7,437
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
75
1,847
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
110
833
0
25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
467
24,160
0
26 Jul 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
53
229
0
10 Jul 2019
Robust Neural Machine Translation with Doubly Adversarial Inputs
Robust Neural Machine Translation with Doubly Adversarial Inputs
Yong Cheng
Lu Jiang
Wolfgang Macherey
AAML
59
255
0
06 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
57
2,633
0
05 Jun 2019
PAWS: Paraphrase Adversaries from Word Scrambling
PAWS: Paraphrase Adversaries from Word Scrambling
Yuan Zhang
Jason Baldridge
Luheng He
66
537
0
01 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
56
419
0
28 Mar 2019
Improved Knowledge Distillation via Teacher Assistant
Improved Knowledge Distillation via Teacher Assistant
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Ang Li
Nir Levine
Akihiro Matsukawa
H. Ghasemzadeh
89
1,073
0
09 Feb 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural
  Language Inference
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
122
1,226
0
04 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.2K
93,936
0
11 Oct 2018
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
225
579
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
744
7,080
0
20 Apr 2018
Annotation Artifacts in Natural Language Inference Data
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
117
1,167
0
06 Mar 2018
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
143
1,779
0
10 Oct 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
519
129,831
0
12 Jun 2017
Categorical Reparameterization with Gumbel-Softmax
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
234
5,323
0
03 Nov 2016
Adversarial Training Methods for Semi-Supervised Text Classification
Adversarial Training Methods for Semi-Supervised Text Classification
Takeru Miyato
Andrew M. Dai
Ian Goodfellow
GAN
67
1,056
0
25 May 2016
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
174
7,683
0
31 Aug 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
286
19,523
0
09 Mar 2015
Explaining and Harnessing Adversarial Examples
Explaining and Harnessing Adversarial Examples
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
AAML
GAN
201
18,922
0
20 Dec 2014
Estimating or Propagating Gradients Through Stochastic Neurons for
  Conditional Computation
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio
Nicholas Léonard
Aaron Courville
334
3,099
0
15 Aug 2013
1