Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.14793
Cited By
M
3
^3
3
ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
26 October 2022
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design"
8 / 58 papers shown
Title
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers
Shiwei Liu
Zhangyang Wang
32
30
0
06 Feb 2023
Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners
Zitian Chen
Yikang Shen
Mingyu Ding
Zhenfang Chen
Hengshuang Zhao
E. Learned-Miller
Chuang Gan
MoE
13
14
0
15 Dec 2022
Accelerating Distributed MoE Training and Inference with Lina
Jiamin Li
Yimin Jiang
Yibo Zhu
Cong Wang
Hong-Yu Xu
MoE
22
58
0
31 Oct 2022
Tricks for Training Sparse Translation Models
Dheeru Dua
Shruti Bhosale
Vedanuj Goswami
James Cross
M. Lewis
Angela Fan
MoE
150
19
0
15 Oct 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
289
1,524
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
289
3,623
0
24 Feb 2021
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
213
1,367
0
06 Jun 2016
Learning Task Grouping and Overlap in Multi-task Learning
Abhishek Kumar
Hal Daumé
184
524
0
27 Jun 2012
Previous
1
2