ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14066
  4. Cited By
One-stop Training of Multiple Capacity Models

One-stop Training of Multiple Capacity Models

23 May 2023
Lan Jiang
Haoyang Huang
Dongdong Zhang
R. Jiang
Furu Wei
ArXivPDFHTML

Papers citing "One-stop Training of Multiple Capacity Models"

13 / 13 papers shown
Title
Small Models are Valuable Plug-ins for Large Language Models
Small Models are Valuable Plug-ins for Large Language Models
Canwen Xu
Yichong Xu
Shuohang Wang
Yang Liu
Chenguang Zhu
Julian McAuley
LLMAG
64
47
0
15 May 2023
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
Lan Jiang
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
R. Jiang
AAML
52
8
0
18 Oct 2022
A Survey on Model Compression and Acceleration for Pretrained Language
  Models
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
73
60
0
15 Feb 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
59
292
0
14 Jan 2022
Learning Language Specific Sub-network for Multilingual Machine
  Translation
Learning Language Specific Sub-network for Multilingual Machine Translation
Zehui Lin
Liwei Wu
Mingxuan Wang
Lei Li
34
81
0
19 May 2021
Learning Student-Friendly Teacher Networks for Knowledge Distillation
Learning Student-Friendly Teacher Networks for Knowledge Distillation
D. Park
Moonsu Cha
C. Jeong
Daesin Kim
Bohyung Han
149
101
0
12 Feb 2021
Multi-task Learning for Multilingual Neural Machine Translation
Multi-task Learning for Multilingual Neural Machine Translation
Yiren Wang
Chengxiang Zhai
Hany Awadalla
44
69
0
06 Oct 2020
Knowledge Distillation for Multilingual Unsupervised Neural Machine
  Translation
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation
Haipeng Sun
Rui Wang
Kehai Chen
Masao Utiyama
Eiichiro Sumita
Tiejun Zhao
AIMat
41
47
0
21 Apr 2020
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
166
6,496
0
05 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
152
7,437
0
02 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
93
588
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
73
1,847
0
23 Sep 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
92
1,051
0
25 May 2019
1