Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.03760
Cited By
DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning
7 June 2021
Hussein Hazimeh
Zhe Zhao
Aakanksha Chowdhery
M. Sathiamoorthy
Yihua Chen
Rahul Mazumder
Lichan Hong
Ed H. Chi
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning"
32 / 32 papers shown
Title
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
Wenju Sun
Qingyong Li
Yangli-ao Geng
Boyang Li
MoMe
47
0
0
11 May 2025
CoCoAFusE: Beyond Mixtures of Experts via Model Fusion
Aurelio Raffa Ugolini
M. Tanelli
Valentina Breschi
MoE
42
0
0
02 May 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
57
0
0
02 Apr 2025
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Yun Liang
Xiang Chen
MQ
MoE
71
0
0
27 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
212
2
0
10 Mar 2025
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts
Wenju Sun
Qingyong Li
Wen Wang
Yangli-ao Geng
Boyang Li
61
4
0
28 Jan 2025
Generate to Discriminate: Expert Routing for Continual Learning
Yewon Byun
Sanket Vaibhav Mehta
Saurabh Garg
Emma Strubell
Michael Oberst
Bryan Wilder
Zachary Chase Lipton
102
0
0
31 Dec 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Zhiyang Dou
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
50
3
0
21 Oct 2024
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue
Longteng Guo
Jie Cheng
Xuange Gao
Qingbin Liu
MoE
44
0
0
14 Oct 2024
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Jun Luo
Chong Chen
Shandong Wu
FedML
VLM
MoE
57
3
0
14 Oct 2024
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar
J. Obando-Ceron
Rameswar Panda
Hugo Larochelle
Pablo Samuel Castro
MoE
194
3
0
02 Oct 2024
Mixture of Experts in a Mixture of RL settings
Timon Willi
J. Obando-Ceron
Jakob Foerster
Karolina Dziugaite
Pablo Samuel Castro
MoE
72
8
0
26 Jun 2024
SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale
Michael Sapienza
Steven Ripplinger
Simon Gibbs
Jaewon Lee
Pranav Mistry
LRM
ELM
43
4
0
07 May 2024
Multimodal Clinical Trial Outcome Prediction with Large Language Models
Wenhao Zheng
Dongsheng Peng
Hongxia Xu
Yun Li
Hongtu Zhu
Tianfan Fu
Huaxiu Yao
Huaxiu Yao
54
5
0
09 Feb 2024
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
Youwei Pang
Xiaoqi Zhao
Tian-Zhu Xiang
Lihe Zhang
Huchuan Lu
37
28
0
31 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
55
11
0
22 Oct 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
53
55
0
24 May 2023
Modular Deep Learning
Jonas Pfeiffer
Sebastian Ruder
Ivan Vulić
Edoardo Ponti
MoMe
OOD
64
72
0
22 Feb 2023
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Michael E. Sander
J. Puigcerver
Josip Djolonga
Gabriel Peyré
Mathieu Blondel
33
20
0
02 Feb 2023
Adaptive Pattern Extraction Multi-Task Learning for Multi-Step Conversion Estimations
Xuewen Tao
Mingming Ha
Xiaobo Guo
Qiongxu Ma
Ho Kei Cheng
Wenfang Lin
38
0
0
06 Jan 2023
Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Nina Shvetsova
Felix Petersen
Anna Kukleva
Bernt Schiele
Hilde Kuehne
SSL
56
13
0
05 Jan 2023
RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction
Donghao Zhou
Chunbin Gu
Junde Xu
Furui Liu
Qiong Wang
Guangyong Chen
Pheng-Ann Heng
MoE
23
4
0
20 Dec 2022
HMOE: Hypernetwork-based Mixture of Experts for Domain Generalization
Jingang Qu
T. Faney
Zehao Wang
Patrick Gallinari
Soleiman Yousef
J. D. Hemptinne
OOD
56
7
0
15 Nov 2022
M
3
^3
3
ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
44
82
0
26 Oct 2022
Mixture of experts models for multilevel data: modelling framework and approximation theory
Tsz Chai Fung
Spark C. Tseung
33
3
0
30 Sep 2022
UFO: Unified Feature Optimization
Teng Xi
Yifan Sun
Deli Yu
Bi Li
Nan Peng
...
Haocheng Feng
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
39
10
0
21 Jul 2022
LibMTL: A Python Library for Multi-Task Learning
Baijiong Lin
Yu Zhang
OffRL
AI4CE
33
37
0
27 Mar 2022
Dynamic and Context-Dependent Stock Price Prediction Using Attention Modules and News Sentiment
Nicole Koenigstein
AIFin
38
1
0
13 Mar 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
29
183
0
17 Feb 2022
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
50
177
0
02 Feb 2022
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Xiaonan Nie
Xupeng Miao
Shijie Cao
Lingxiao Ma
Qibin Liu
Jilong Xue
Youshan Miao
Yi Liu
Zhi-Xin Yang
Tengjiao Wang
MoMe
MoE
32
23
0
29 Dec 2021
1