Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14066
Cited By
One-stop Training of Multiple Capacity Models
23 May 2023
Lan Jiang
Haoyang Huang
Dongdong Zhang
R. Jiang
Furu Wei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"One-stop Training of Multiple Capacity Models"
13 / 13 papers shown
Title
Small Models are Valuable Plug-ins for Large Language Models
Canwen Xu
Yichong Xu
Shuohang Wang
Yang Liu
Chenguang Zhu
Julian McAuley
LLMAG
64
47
0
15 May 2023
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
Lan Jiang
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
R. Jiang
AAML
52
8
0
18 Oct 2022
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
73
60
0
15 Feb 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
59
292
0
14 Jan 2022
Learning Language Specific Sub-network for Multilingual Machine Translation
Zehui Lin
Liwei Wu
Mingxuan Wang
Lei Li
34
81
0
19 May 2021
Learning Student-Friendly Teacher Networks for Knowledge Distillation
D. Park
Moonsu Cha
C. Jeong
Daesin Kim
Bohyung Han
149
101
0
12 Feb 2021
Multi-task Learning for Multilingual Neural Machine Translation
Yiren Wang
Chengxiang Zhai
Hany Awadalla
44
69
0
06 Oct 2020
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation
Haipeng Sun
Rui Wang
Kehai Chen
Masao Utiyama
Eiichiro Sumita
Tiejun Zhao
AIMat
41
47
0
21 Apr 2020
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
166
6,496
0
05 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
152
7,437
0
02 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
93
588
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
73
1,847
0
23 Sep 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
92
1,051
0
25 May 2019
1