ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.08890
  4. Cited By
Distilling Large Language Models into Tiny and Effective Students using
  pQRNN

Distilling Large Language Models into Tiny and Effective Students using pQRNN

21 January 2021
P. Kaliamoorthi
Aditya Siddhant
Edward Li
Melvin Johnson
    MQ
ArXiv (abs)PDFHTML

Papers citing "Distilling Large Language Models into Tiny and Effective Students using pQRNN"

41 / 41 papers shown
Title
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
152
143
0
24 Oct 2020
mT5: A massively multilingual pre-trained text-to-text transformer
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
142
2,559
0
22 Oct 2020
FILTER: An Enhanced Fusion Method for Cross-lingual Language
  Understanding
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
Yuwei Fang
Shuohang Wang
Zhe Gan
S. Sun
Jingjing Liu
VLM
57
58
0
10 Sep 2020
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing
  Benchmark
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark
Haoran Li
Abhinav Arora
Shuohui Chen
Anchit Gupta
Sonal Gupta
Yashar Mehdad
107
179
0
21 Aug 2020
An Empirical Analysis of the Impact of Data Augmentation on Knowledge
  Distillation
An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation
Deepan Das
Haley Massa
Abhimanyu Kulkarni
Theodoros Rekatsinas
58
18
0
06 Jun 2020
End-to-End Slot Alignment and Recognition for Cross-Lingual NLU
End-to-End Slot Alignment and Recognition for Cross-Lingual NLU
Weijia Xu
Batool Haider
Saab Mansour
51
155
0
29 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
79
322
0
08 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
109
817
0
06 Apr 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
  Cross-lingual Generalization
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
183
975
0
24 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
156
1,278
0
25 Feb 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer
  Learning
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
59
342
0
19 Feb 2020
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
228
6,585
0
05 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
456
20,298
0
23 Oct 2019
Q8BERT: Quantized 8Bit BERT
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
83
505
0
14 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSLAIMat
371
6,463
0
26 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
109
1,869
0
23 Sep 2019
Evaluating the Cross-Lingual Effectiveness of Massively Multilingual
  Neural Machine Translation
Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
Aditya Siddhant
Melvin Johnson
Henry Tsai
N. Arivazhagan
Jason Riesa
Ankur Bapna
Orhan Firat
Karthik Raman
63
70
0
01 Sep 2019
Small and Practical BERT Models for Sequence Labeling
Small and Practical BERT Models for Sequence Labeling
Henry Tsai
Jason Riesa
Melvin Johnson
N. Arivazhagan
Xin Li
Amelia Archer
VLM
60
121
0
31 Aug 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
134
843
0
25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
674
24,528
0
26 Jul 2019
Massively Multilingual Neural Machine Translation in the Wild: Findings
  and Challenges
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
N. Arivazhagan
Ankur Bapna
Orhan Firat
Dmitry Lepikhin
Melvin Johnson
...
George F. Foster
Colin Cherry
Wolfgang Macherey
Zhiwen Chen
Yonghui Wu
85
428
0
11 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
234
8,444
0
19 Jun 2019
Making Neural Machine Reading Comprehension Faster
Making Neural Machine Reading Comprehension Faster
Debajyoti Chatterjee
AIMat
58
9
0
29 Mar 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
69
421
0
28 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,114
0
11 Oct 2018
Universal Transformers
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
87
755
0
10 Jul 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,196
0
20 Apr 2018
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
159
3,138
0
15 Dec 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
728
132,199
0
12 Jun 2017
Google's Multilingual Neural Machine Translation System: Enabling
  Zero-Shot Translation
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Melvin Johnson
M. Schuster
Quoc V. Le
M. Krikun
Yonghui Wu
...
F. Viégas
Martin Wattenberg
Gregory S. Corrado
Macduff Hughes
Jeffrey Dean
122
2,096
0
14 Nov 2016
Quasi-Recurrent Neural Networks
Quasi-Recurrent Neural Networks
James Bradbury
Stephen Merity
Caiming Xiong
R. Socher
170
444
0
05 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
905
6,796
0
26 Sep 2016
Attention-Based Recurrent Neural Network Models for Joint Intent
  Detection and Slot Filling
Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling
Bing-Quan Liu
Ian Lane
86
677
0
06 Sep 2016
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
David M. Krueger
Tegan Maharaj
János Kramár
Mohammad Pezeshki
Nicolas Ballas
Nan Rosemary Ke
Anirudh Goyal
Yoshua Bengio
Aaron Courville
C. Pal
82
317
0
03 Jun 2016
Revisiting Distributed Synchronous SGD
Revisiting Distributed Synchronous SGD
Jianmin Chen
Xinghao Pan
R. Monga
Samy Bengio
Rafal Jozefowicz
87
801
0
04 Apr 2016
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
263
8,854
0
01 Oct 2015
Learning both Weights and Connections for Efficient Neural Networks
Learning both Weights and Connections for Efficient Neural Networks
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
313
6,694
0
08 Jun 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
364
19,723
0
09 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing
  Internal Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
463
43,328
0
11 Feb 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,260
0
22 Dec 2014
Compressing Deep Convolutional Networks using Vector Quantization
Compressing Deep Convolutional Networks using Vector Quantization
Yunchao Gong
Liu Liu
Ming Yang
Lubomir D. Bourdev
MQ
168
1,171
0
18 Dec 2014
1