Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.00951
Cited By
v1
v2 (latest)
From Sparse to Soft Mixtures of Experts
2 August 2023
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"From Sparse to Soft Mixtures of Experts"
40 / 90 papers shown
Title
MoE-RBench
\texttt{MoE-RBench}
MoE-RBench
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
116
5
0
17 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
96
6
0
07 Jun 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
189
7
0
27 May 2024
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Huy Le Nguyen
Pedram Akbarian
Trang Pham
Trang Nguyen
Shujian Zhang
Nhat Ho
MoE
109
2
0
23 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
206
9
0
23 May 2024
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Huy Nguyen
Nhat Ho
Alessandro Rinaldo
139
7
0
22 May 2024
Learning More Generalized Experts by Merging Experts in Mixture-of-Experts
Sejik Park
FedML
CLL
MoMe
78
5
0
19 May 2024
A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts
Xinru Zhang
N. Ou
Berke Doga Basaran
Marco Visentin
Mengyun Qiao
...
Ouyang Cheng
Yaou Liu
Paul M. Matthew
Chuyang Ye
Wenjia Bai
MedIm
77
8
0
16 May 2024
A Mixture of Experts Approach to 3D Human Motion Prediction
Edmund Shieh
Joshua Lee Franco
Kang Min Bae
Tej Lalvani
68
1
0
09 May 2024
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong
Mengzhou Xia
Danqi Chen
Mike Lewis
MoE
110
19
0
06 May 2024
Adapting to Distribution Shift by Visual Domain Prompt Generation
Zhixiang Chi
Li Gu
Tao Zhong
Huan Liu
Yuanhao Yu
Konstantinos N Plataniotis
Yang Wang
VLM
OOD
96
10
0
05 May 2024
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
Yuan Tang
Xu Han
Xianzhi Li
Qiao Yu
Yixue Hao
Long Hu
Min Chen
91
20
0
02 May 2024
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Jianan Zhou
Zhiguang Cao
Yaoxin Wu
Wen Song
Yining Ma
Jie Zhang
Chi Xu
156
27
0
02 May 2024
Powering In-Database Dynamic Model Slicing for Structured Data Analytics
Lingze Zeng
Naili Xing
Shaofeng Cai
Gang Chen
Bengchin Ooi
Jian Pei
Yuncheng Wu
75
1
0
01 May 2024
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
Shengzhuang Chen
Jihoon Tack
Yunqiao Yang
Yee Whye Teh
Jonathan Richard Schwarz
Ying Wei
MoE
128
4
0
13 Mar 2024
Conditional computation in neural networks: principles and research trends
Simone Scardapane
Alessandro Baiocchi
Alessio Devoto
V. Marsocci
Pasquale Minervini
Jary Pomponi
104
2
0
12 Mar 2024
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Jesse Farebrother
Jordi Orbay
Q. Vuong
Adrien Ali Taïga
Yevgen Chebotar
...
Sergey Levine
Pablo Samuel Castro
Aleksandra Faust
Aviral Kumar
Rishabh Agarwal
OffRL
105
67
0
06 Mar 2024
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
Yuanhang Yang
Shiyi Qi
Wenchao Gu
Chaozheng Wang
Cuiyun Gao
Zenglin Xu
MoE
53
10
0
27 Feb 2024
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
James Oldfield
Markos Georgopoulos
Grigorios G. Chrysos
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Jiankang Deng
Ioannis Patras
MoE
128
10
0
19 Feb 2024
Turn Waste into Worth: Rectifying Top-
k
k
k
Router of MoE
Zhiyuan Zeng
Qipeng Guo
Zhaoye Fei
Zhangyue Yin
Yunhua Zhou
Linyang Li
Tianxiang Sun
Hang Yan
Dahua Lin
Xipeng Qiu
MoE
MoMe
72
6
0
17 Feb 2024
See More Details: Efficient Image Super-Resolution by Experts Mining
Eduard Zamfir
Zongwei Wu
Nancy Mehta
Yulun Zhang
Radu Timofte
SupR
134
16
0
05 Feb 2024
On Least Square Estimation in Softmax Gating Mixture of Experts
Huy Nguyen
Nhat Ho
Alessandro Rinaldo
111
16
0
05 Feb 2024
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Xing Han
Huy Nguyen
Carl Harris
Nhat Ho
Suchi Saria
MoE
128
22
0
05 Feb 2024
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
Umberto Cappellazzo
Daniele Falavigna
Alessio Brutti
MoE
67
3
0
01 Feb 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
105
104
0
29 Jan 2024
Routers in Vision Mixture of Experts: An Empirical Study
Tianlin Liu
Mathieu Blondel
C. Riquelme
J. Puigcerver
MoE
95
3
0
29 Jan 2024
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
Tinghui Ouyang
AprilPyone Maungmaung
Koichi Konishi
Yoshiki Seo
Isao Echizen
AI4MH
77
8
0
15 Jan 2024
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Haoyuan Wu
Haisheng Zheng
Zhuolun He
Bei Yu
MoE
ALM
104
16
0
05 Jan 2024
Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models
Matthew Dahl
Varun Magesh
Mirac Suzgun
Daniel E. Ho
HILM
AILaw
126
86
0
02 Jan 2024
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
Bartosz Wójcik
Alessio Devoto
Karol Pustelnik
Pasquale Minervini
Simone Scardapane
88
6
0
15 Dec 2023
Batched Low-Rank Adaptation of Foundation Models
Yeming Wen
Swarat Chaudhuri
OffRL
102
21
0
09 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
82
16
0
01 Dec 2023
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
Do Huu Dat
Po Yuan Mao
Tien Hoang Nguyen
Wray Buntine
Bennamoun
122
2
0
23 Nov 2023
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
114
66
0
21 Nov 2023
SiRA: Sparse Mixture of Low Rank Adaptation
Yun Zhu
Nevan Wichers
Chu-Cheng Lin
Xinyi Wang
Tianlong Chen
...
Han Lu
Canoee Liu
Liangchen Luo
Jindong Chen
Lei Meng
MoE
85
28
0
15 Nov 2023
Mixture of Weak & Strong Experts on Graphs
Hanqing Zeng
Hanjia Lyu
Diyi Hu
Yinglong Xia
Jiebo Luo
103
4
0
09 Nov 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
89
29
0
25 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
188
74
0
06 Oct 2023
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
Chen Dun
Mirian Hipolito Garcia
Guoqing Zheng
Ahmed Hassan Awadallah
Anastasios Kyrillidis
Robert Sim
212
6
0
04 Oct 2023
Soft Merging of Experts with Adaptive Routing
Mohammed Muqeeth
Haokun Liu
Colin Raffel
MoMe
MoE
109
54
0
06 Jun 2023
Previous
1
2