Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1701.06538
Cited By
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"
50 / 126 papers shown
Title
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
171
21
0
17 Jan 2025
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
Xueyan Li
Xinyan Chen
Yazhe Niu
Shuai Hu
Yu Liu
OffRL
79
3
0
17 Jan 2025
Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning
Zhongyi Zhou
Chaomin Shen
Pin Yi
Minjie Zhu
Yaxin Peng
351
0
0
04 Jan 2025
Spatio-Temporal Multi-Subgraph GCN for 3D Human Motion Prediction
Jiexin Wang
Yiju Guo
Fuchun Sun
3DH
78
0
0
03 Jan 2025
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection
Yuchen Li
Xianrui Li
Yunheng Li
Yanzhe Zhang
Yimian Dai
Qibin Hou
Ming-Ming Cheng
Jian Yang
119
7
0
31 Dec 2024
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
115
0
0
31 Dec 2024
Generate to Discriminate: Expert Routing for Continual Learning
Yewon Byun
Sanket Vaibhav Mehta
Saurabh Garg
Emma Strubell
Michael Oberst
Bryan Wilder
Zachary Chase Lipton
119
0
0
31 Dec 2024
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
Jingbo Lin
Zhilu Zhang
Wenbo Li
Renjing Pei
Hang Xu
Hongzhi Zhang
Wangmeng Zuo
63
0
0
28 Dec 2024
Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond
Loukas Ilias
George Doukas
Vangelis Lamprou
Christos Ntanos
D. Askounis
MoE
94
1
0
04 Dec 2024
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Qu He
Jinlong Peng
P. Xu
Boyuan Jiang
Xiaobin Hu
...
Yang Liu
Yun Wang
Chengjie Wang
Xuelong Li
Jing Zhang
DiffM
158
1
0
04 Dec 2024
One Model for One Graph: A New Perspective for Pretraining with Cross-domain Graphs
J. Liu
Haitao Mao
Zhikai Chen
Wenqi Fan
Mingxuan Ju
Tong Zhao
Neil Shah
Neil Shah
Jiliang Tang
AI4CE
147
1
0
30 Nov 2024
HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting
Shaohan Yu
Pan Deng
Yu Zhao
Jiaheng Liu
Ziáng Wang
MoE
411
0
0
30 Nov 2024
Task Singular Vectors: Reducing Task Interference in Model Merging
Antonio Andrea Gargiulo
Donato Crisostomi
Maria Sofia Bucarelli
Simone Scardapane
Fabrizio Silvestri
Emanuele Rodolà
MoMe
119
14
0
26 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yanjie Wang
Gangshan Wu
Tong He
Limin Wang
123
3
0
21 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
407
3
0
20 Nov 2024
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation
Langlin Huang
Mengyu Bu
Yang Feng
57
0
0
03 Nov 2024
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
87
5
0
29 Oct 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
50
3
0
28 Oct 2024
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization
Zhecheng Li
Yijiao Wang
Bryan Hooi
Yujun Cai
Naifan Cheung
Nanyun Peng
Kai-Wei Chang
91
1
0
26 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
57
4
0
24 Oct 2024
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Artem Basharin
Andrei Chertkov
Ivan Oseledets
95
1
0
23 Oct 2024
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Boxing Chen
Sarath Chandar
94
0
0
22 Oct 2024
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
46
4
0
21 Oct 2024
MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning
Suning Huang
Zheyu Zhang
Tianhai Liang
Yihan Xu
Zhehao Kou
Chenhao Lu
Guowei Xu
Zhengrong Xue
Huazhe Xu
MoE
72
3
0
19 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
82
20
0
17 Oct 2024
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
Kai Zhang
Xunliang Cai
MoE
93
3
0
16 Oct 2024
GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation
Fei Tang
Yongliang Shen
Hang Zhang
Zeqi Tan
Wenqi Zhang
Guiyang Hou
Kaitao Song
Weiming Lu
Yueting Zhuang
77
0
0
15 Oct 2024
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
48
24
0
14 Oct 2024
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Jun Luo
Chong Chen
Shandong Wu
FedML
VLM
MoE
66
3
0
14 Oct 2024
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Guorui Zheng
Xidong Wang
Juhao Liang
Nuo Chen
Yuping Zheng
Benyou Wang
MoE
74
5
0
14 Oct 2024
GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks
Dingyi Zhuang
Chonghe Jiang
Yunhan Zheng
Shenhao Wang
Jinhua Zhao
UQCV
70
0
0
12 Oct 2024
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Zhipeng Chen
Liang Song
K. Zhou
Wayne Xin Zhao
Binghai Wang
Weipeng Chen
Ji-Rong Wen
81
0
0
10 Oct 2024
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
Sagi Shaier
Francisco Pereira
Katharina von der Wense
Lawrence E Hunter
Matt Jones
MoE
66
0
0
10 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
47
4
0
08 Oct 2024
No Need to Talk: Asynchronous Mixture of Language Models
Anastasiia Filippova
Angelos Katharopoulos
David Grangier
Ronan Collobert
MoE
54
0
0
04 Oct 2024
Collaborative and Efficient Personalization with Mixtures of Adaptors
Abdulla Jasem Almansoori
Samuel Horváth
Martin Takáč
FedML
64
3
0
04 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
64
5
0
03 Oct 2024
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar
J. Obando-Ceron
Rameswar Panda
Hugo Larochelle
Pablo Samuel Castro
MoE
240
5
0
02 Oct 2024
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun
Tao Lei
Bowen Zhang
Yanghao Li
Haoshuo Huang
Ruoming Pang
Bo Dai
Nan Du
DiffM
MoE
112
5
0
02 Oct 2024
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Bingshen Mu
Kun Wei
Qijie Shao
Yong Xu
Lei Xie
MoE
60
2
0
30 Sep 2024
Scaling Optimal LR Across Token Horizons
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
80
5
0
30 Sep 2024
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Jihai Zhang
Xiaoye Qu
Tong Zhu
Yu Cheng
65
8
0
28 Sep 2024
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi
Shiyu Wang
Yuqi Nie
Dianqi Li
Zhou Ye
Qingsong Wen
Ming Jin
AI4TS
66
37
0
24 Sep 2024
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
86
1
0
16 Sep 2024
Breaking Neural Network Scaling Laws with Modularity
Akhilan Boopathy
Sunshine Jiang
William Yue
Jaedong Hwang
Abhiram Iyer
Ila Fiete
OOD
98
2
0
09 Sep 2024
Continual learning with the neural tangent ensemble
Ari S. Benjamin
Christian Pehle
Kyle Daruwalla
UQCV
94
0
0
30 Aug 2024
DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation
Xiaowei Mao
Yan Lin
Shengnan Guo
Yubin Chen
Xingyu Xian
Haomin Wen
Qisen Xu
Youfang Lin
Huaiyu Wan
59
1
0
23 Aug 2024
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation
Xiaoyu Kong
Jiancan Wu
An Zhang
Leheng Sheng
Hui Lin
Xiang Wang
Xiangnan He
AI4TS
83
10
0
19 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
107
2
0
13 Aug 2024
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
Quang H. Nguyen
Duy C. Hoang
Juliette Decugis
Saurav Manchanda
Nitesh Chawla
Khoa D. Doan
Khoa D. Doan
131
8
0
15 Jul 2024
Previous
1
2
3
Next