Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.00951
Cited By
v1
v2 (latest)
From Sparse to Soft Mixtures of Experts
2 August 2023
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"From Sparse to Soft Mixtures of Experts"
50 / 90 papers shown
Title
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
Guozheng Ma
Lu Li
Zilin Wang
Li Shen
Pierre-Luc Bacon
Dacheng Tao
OffRL
27
0
0
20 Jun 2025
Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models
Daniel Fidel Harvey
George Weale
Berk Yilmaz
MoE
19
0
0
19 Jun 2025
NaSh: Guardrails for an LLM-Powered Natural Language Shell
Bimal Raj Gyawali
Saikrishna Achalla
Konstantinos Kallas
Sam Kumar
22
0
0
16 Jun 2025
Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control
Rongpeng Li
Jianhang Zhu
Jiahao Huang
Zhifeng Zhao
Honggang Zhang
28
0
0
14 Jun 2025
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
James Oldfield
Shawn Im
Yixuan Li
M. Nicolaou
Ioannis Patras
Grigorios G. Chrysos
MoE
58
0
0
27 May 2025
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Ghada Sokar
Pablo Samuel Castro
59
0
0
23 May 2025
Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection
Haokai Zhang
Shengtao Zhang
Zijian Cai
Heng Wang
Ruixuan Zhu
Zinan Zeng
Minnan Luo
138
0
0
24 Apr 2025
Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection
Bo Lin
Shangwen Wang
Yihao Qin
Liqian Chen
Xiaoguang Mao
SILM
59
0
0
23 Apr 2025
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming
Zhiqiang He
Zhi Liu
88
0
0
14 Apr 2025
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
ViT
86
2
0
11 Apr 2025
Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection
Shunxin Chen
Ajian Liu
Junze Zheng
Jun Wan
Kailai Peng
Sergio Escalera
Zhen Lei
AAML
121
0
0
01 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
442
4
0
26 Mar 2025
The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas
Giovanni Franco Gabriel Marraffini
Andrés Cotton
Noe Fabian Hsueh
Axel Fridman
Juan Wisznia
Luciano Del Corro
72
3
0
25 Mar 2025
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
Wenrui Cai
Qingjie Liu
Yansen Wang
MoE
158
0
0
24 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
500
5
0
10 Mar 2025
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
Umberto Cappellazzo
Minsu Kim
Stavros Petridis
113
1
0
09 Mar 2025
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Y. Huang
Peng Ye
Chenyu Huang
Jianjian Cao
Lin Zhang
Baopu Li
Gang Yu
Tao Chen
MoMe
MoE
90
3
0
03 Mar 2025
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks
Loukas Ilias
D. Askounis
70
1
0
27 Feb 2025
Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM
Junxiao Ma
Jingjing Wang
Jiamin Luo
Peiying Yu
Guodong Zhou
121
1
0
26 Feb 2025
Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos
Jiamin Luo
Jingjing Wang
Junxiao Ma
Yujie Jin
Shoushan Li
Guodong Zhou
92
0
0
26 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
152
0
0
24 Feb 2025
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE
Andrei Chernov
Oleg Novitskij
148
0
0
24 Feb 2025
Tight Clusters Make Specialized Experts
Stefan K. Nielsen
R. Teo
Laziz U. Abdullaev
Tan M. Nguyen
MoE
145
4
0
21 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu
Shu Li
Tianyu Zheng
Zhaoxiang Zhang
524
0
0
10 Feb 2025
(GG) MoE vs. MLP on Tabular Data
Andrei Chernov
BDL
MoE
158
1
0
05 Feb 2025
A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis
Binyu Zhang
Shichao Li
Junpeng Jian
Zhu Meng
Limei Guo
Zhicheng Zhao
56
0
0
13 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
254
134
0
10 Jan 2025
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
Jingbo Lin
Zhilu Zhang
Wenbo Li
Renjing Pei
Hang Xu
Hongzhi Zhang
Wangmeng Zuo
118
1
0
28 Dec 2024
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
Eduard Zamfir
Zongwei Wu
Nancy Mehta
Yuedong Tan
Danda Pani Paudel
Yulun Zhang
Radu Timofte
MoE
540
5
0
27 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yanjie Wang
Gangshan Wu
Tong He
Limin Wang
204
3
0
21 Nov 2024
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
Jie Zhu
Yukang Chen
Mingyu Ding
Ping Luo
Leye Wang
Jingdong Wang
DiffM
69
5
0
30 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
94
3
0
18 Oct 2024
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
81
3
0
15 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
105
19
0
15 Oct 2024
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Yanru Sun
Zongxia Xie
Emadeldeen Eldele
Dongyue Chen
Q. Hu
Min-man Wu
AI4TS
62
2
0
13 Oct 2024
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
Hongtao Wu
Yijun Yang
Angelica I Aviles-Rivero
Jingjing Ren
Sixiang Chen
Haoyu Chen
Lei Zhu
86
0
0
10 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
67
7
0
08 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
137
8
0
03 Oct 2024
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun
Tao Lei
Bowen Zhang
Yanghao Li
Haoshuo Huang
Ruoming Pang
Bo Dai
Nan Du
DiffM
MoE
213
9
0
02 Oct 2024
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar
J. Obando-Ceron
Rameswar Panda
Hugo Larochelle
Pablo Samuel Castro
MoE
338
7
0
02 Oct 2024
Mastering Chess with a Transformer Model
Daniel Monroe
The Leela Chess Zero Team
73
3
0
18 Sep 2024
Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts
Youngseog Chung
Dhruv Malik
J. Schneider
Yuanzhi Li
Aarti Singh
MoE
115
1
0
02 Sep 2024
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection
Hang Zou
Chenxi Du
Hui Zhang
Yuan Zhang
Ajian Liu
Jun Wan
Zhen Lei
AAML
82
4
0
23 Aug 2024
A Unified Framework for Iris Anti-Spoofing: Introducing Iris Anti-Spoofing Cross-Domain-Testing Protocol and Masked-MoE Method
Hang Zou
Chenxi Du
Ajian Liu
Yuan Zhang
Jing Liu
Mingchuan Yang
Jun Wan
Hui Zhang
Zhenan Sun
73
0
0
19 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
155
2
0
13 Aug 2024
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
Yulin Ren
Xin Li
Bingchen Li
Xingrui Wang
Mengxi Guo
Shijie Zhao
Li Zhang
Zhibo Chen
DiffM
129
7
0
15 Jul 2024
Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation
Szymon Płotka
Maciej Chrabaszcz
Przemyslaw Biecek
62
2
0
10 Jul 2024
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoE
66
3
0
28 Jun 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Tingting Gao
Xi Li
MoE
160
2
0
28 Jun 2024
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Haoze Wu
Zihan Qiu
Zili Wang
Hang Zhao
Jie Fu
MoE
100
3
0
18 Jun 2024
1
2
Next