Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1701.06538
Cited By
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"
26 / 126 papers shown
Title
Low-Rank Interconnected Adaptation across Layers
Yibo Zhong
Jinman Zhao
Yao Zhou
OffRL
MoE
71
1
0
13 Jul 2024
PLeaS -- Merging Models with Permutations and Least Squares
Anshul Nasery
J. Hayase
Pang Wei Koh
Sewoong Oh
MoMe
67
3
0
02 Jul 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Tingting Gao
Xi Li
MoE
80
2
0
28 Jun 2024
Compositional Models for Estimating Causal Effects
Purva Pruthi
David D. Jensen
CML
111
0
0
25 Jun 2024
Submodular Framework for Structured-Sparse Optimal Transport
Piyushi Manupriya
Pratik Jawanpuria
Karthik S. Gurumoorthy
SakethaNath Jagarlapudi
Bamdev Mishra
OT
118
0
0
07 Jun 2024
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Dujian Ding
Bicheng Xu
L. Lakshmanan
VLM
73
1
0
06 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Yangfu Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
60
10
0
04 Jun 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
125
4
0
27 May 2024
Mixture of Experts Meets Prompt-Based Continual Learning
Minh Le
An Nguyen
Huy Nguyen
Trang Nguyen
Trang Pham
L. Ngo
Nhat Ho
CLL
71
8
0
23 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
86
8
0
23 May 2024
DirectMultiStep: Direct Route Generation for Multistep Retrosynthesis
Yu Shee
Haote Li
Anton Morgunov
Victor S. Batista
63
2
0
22 May 2024
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Zhili Liu
Yunhao Gou
Kai Chen
Lanqing Hong
Jiahui Gao
...
Yu Zhang
Zhenguo Li
Xin Jiang
Qiang Liu
James T. Kwok
MoE
171
9
0
01 May 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
88
8
0
07 Apr 2024
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
67
2
0
26 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
132
6
0
03 Mar 2024
Multimodal Clinical Trial Outcome Prediction with Large Language Models
Wenhao Zheng
Dongsheng Peng
Hongxia Xu
Yun Li
Hongtu Zhu
Tianfan Fu
Huaxiu Yao
Huaxiu Yao
102
5
0
09 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomas Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
164
389
0
09 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
149
112
0
08 Feb 2024
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Xing Han
Huy Nguyen
Carl Harris
Nhat Ho
Suchi Saria
MoE
88
18
0
05 Feb 2024
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
67
0
0
01 Dec 2023
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
Do Huu Dat
Po Yuan Mao
Tien Hoang Nguyen
Wray Buntine
Bennamoun
71
1
0
23 Nov 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
95
30
0
26 Mar 2023
Motion Prediction Under Multimodality with Conditional Stochastic Networks
Katerina Fragkiadaki
Jonathan Huang
Alexander A. Alemi
Sudheendra Vijayanarasimhan
Susanna Ricco
Rahul Sukthankar
3DH
51
25
0
05 May 2017
Expert Gate: Lifelong Learning with a Network of Experts
Rahaf Aljundi
Punarjay Chakravarty
Tinne Tuytelaars
CLL
58
654
0
18 Nov 2016
Memory-Efficient Backpropagation Through Time
A. Gruslys
Rémi Munos
Ivo Danihelka
Marc Lanctot
Alex Graves
47
228
0
10 Jun 2016
Building high-level features using large scale unsupervised learning
Quoc V. Le
MarcÁurelio Ranzato
R. Monga
M. Devin
Kai Chen
G. Corrado
J. Dean
A. Ng
SSL
OffRL
CVBM
74
2,268
0
29 Dec 2011
Previous
1
2
3