Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.16716
Cited By
BASE Layers: Simplifying Training of Large, Sparse Models
30 March 2021
M. Lewis
Shruti Bhosale
Tim Dettmers
Naman Goyal
Luke Zettlemoyer
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BASE Layers: Simplifying Training of Large, Sparse Models"
8 / 208 papers shown
Title
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
29
579
0
10 Jun 2021
Hash Layers For Large Sparse Models
Stephen Roller
Sainbayar Sukhbaatar
Arthur Szlam
Jason Weston
MoE
27
209
0
08 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
32
11
0
08 Jun 2021
M6-T: Exploring Sparse Expert Models and Beyond
An Yang
Junyang Lin
Rui Men
Chang Zhou
Le Jiang
...
Dingyang Zhang
Wei Lin
Lin Qu
Jingren Zhou
Hongxia Yang
MoE
39
24
0
31 May 2021
M6: A Chinese Multimodal Pretrainer
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
...
Yong Li
Wei Lin
Jingren Zhou
J. Tang
Hongxia Yang
VLM
MoE
37
133
0
01 Mar 2021
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
114
1,104
0
14 Sep 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
255
580
0
12 Mar 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
Previous
1
2
3
4
5