ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.00951
  4. Cited By
From Sparse to Soft Mixtures of Experts
v1v2 (latest)

From Sparse to Soft Mixtures of Experts

2 August 2023
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
    MoE
ArXiv (abs)PDFHTML

Papers citing "From Sparse to Soft Mixtures of Experts"

50 / 90 papers shown
Title
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
Guozheng Ma
Lu Li
Zilin Wang
Li Shen
Pierre-Luc Bacon
Dacheng Tao
OffRL
27
0
0
20 Jun 2025
Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models
Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models
Daniel Fidel Harvey
George Weale
Berk Yilmaz
MoE
19
0
0
19 Jun 2025
NaSh: Guardrails for an LLM-Powered Natural Language Shell
NaSh: Guardrails for an LLM-Powered Natural Language Shell
Bimal Raj Gyawali
Saikrishna Achalla
Konstantinos Kallas
Sam Kumar
22
0
0
16 Jun 2025
Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control
Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control
Rongpeng Li
Jianhang Zhu
Jiahao Huang
Zhifeng Zhao
Honggang Zhang
28
0
0
14 Jun 2025
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
James Oldfield
Shawn Im
Yixuan Li
M. Nicolaou
Ioannis Patras
Grigorios G. Chrysos
MoE
58
0
0
27 May 2025
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Ghada Sokar
Pablo Samuel Castro
59
0
0
23 May 2025
Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection
Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection
Haokai Zhang
Shengtao Zhang
Zijian Cai
Heng Wang
Ruixuan Zhu
Zinan Zeng
Minnan Luo
138
0
0
24 Apr 2025
Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection
Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection
Bo Lin
Shangwen Wang
Yihao Qin
Liqian Chen
Xiaoguang Mao
SILM
59
0
0
23 Apr 2025
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming
Zhiqiang He
Zhi Liu
88
0
0
14 Apr 2025
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
ViT
86
2
0
11 Apr 2025
Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection
Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection
Shunxin Chen
Ajian Liu
Junze Zheng
Jun Wan
Kailai Peng
Sergio Escalera
Zhen Lei
AAML
121
0
0
01 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELMOffRLLRMAI4CE
442
4
0
26 Mar 2025
The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas
The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas
Giovanni Franco Gabriel Marraffini
Andrés Cotton
Noe Fabian Hsueh
Axel Fridman
Juan Wisznia
Luciano Del Corro
72
3
0
25 Mar 2025
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
Wenrui Cai
Qingjie Liu
Yansen Wang
MoE
158
0
0
24 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
500
5
0
10 Mar 2025
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
Umberto Cappellazzo
Minsu Kim
Stavros Petridis
113
1
0
09 Mar 2025
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Y. Huang
Peng Ye
Chenyu Huang
Jianjian Cao
Lin Zhang
Baopu Li
Gang Yu
Tao Chen
MoMeMoE
90
3
0
03 Mar 2025
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks
Loukas Ilias
D. Askounis
70
1
0
27 Feb 2025
Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM
Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM
Junxiao Ma
Jingjing Wang
Jiamin Luo
Peiying Yu
Guodong Zhou
121
1
0
26 Feb 2025
Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos
Jiamin Luo
Jingjing Wang
Junxiao Ma
Yujie Jin
Shoushan Li
Guodong Zhou
92
0
0
26 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
152
0
0
24 Feb 2025
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE
Andrei Chernov
Oleg Novitskij
148
0
0
24 Feb 2025
Tight Clusters Make Specialized Experts
Tight Clusters Make Specialized Experts
Stefan K. Nielsen
R. Teo
Laziz U. Abdullaev
Tan M. Nguyen
MoE
145
4
0
21 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu
Shu Li
Tianyu Zheng
Zhaoxiang Zhang
524
0
0
10 Feb 2025
(GG) MoE vs. MLP on Tabular Data
(GG) MoE vs. MLP on Tabular Data
Andrei Chernov
BDLMoE
158
1
0
05 Feb 2025
A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis
A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis
Binyu Zhang
Shichao Li
Junpeng Jian
Zhu Meng
Limei Guo
Zhicheng Zhao
56
0
0
13 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
254
134
0
10 Jan 2025
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
Jingbo Lin
Zhilu Zhang
Wenbo Li
Renjing Pei
Hang Xu
Hongzhi Zhang
Wangmeng Zuo
118
1
0
28 Dec 2024
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
Eduard Zamfir
Zongwei Wu
Nancy Mehta
Yuedong Tan
Danda Pani Paudel
Yulun Zhang
Radu Timofte
MoE
540
5
0
27 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yanjie Wang
Gangshan Wu
Tong He
Limin Wang
204
3
0
21 Nov 2024
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of
  Low-rank Experts
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
Jie Zhu
Yukang Chen
Mingyu Ding
Ping Luo
Leye Wang
Jingdong Wang
DiffM
69
5
0
30 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
94
3
0
18 Oct 2024
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
81
3
0
15 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
105
19
0
15 Oct 2024
Learning Pattern-Specific Experts for Time Series Forecasting Under
  Patch-level Distribution Shift
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Yanru Sun
Zongxia Xie
Emadeldeen Eldele
Dongyue Chen
Q. Hu
Min-man Wu
AI4TS
62
2
0
13 Oct 2024
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts
  and Distribution-Driven Contrastive Regularization
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
Hongtao Wu
Yijun Yang
Angelica I Aviles-Rivero
Jingjing Ren
Sixiang Chen
Haoyu Chen
Lei Zhu
86
0
0
10 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
67
7
0
08 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
137
8
0
03 Oct 2024
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun
Tao Lei
Bowen Zhang
Yanghao Li
Haoshuo Huang
Ruoming Pang
Bo Dai
Nan Du
DiffMMoE
213
9
0
02 Oct 2024
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar
J. Obando-Ceron
Rameswar Panda
Hugo Larochelle
Pablo Samuel Castro
MoE
338
7
0
02 Oct 2024
Mastering Chess with a Transformer Model
Mastering Chess with a Transformer Model
Daniel Monroe
The Leela Chess Zero Team
73
3
0
18 Sep 2024
Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts
Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts
Youngseog Chung
Dhruv Malik
J. Schneider
Yuanzhi Li
Aarti Singh
MoE
115
1
0
02 Sep 2024
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection
Hang Zou
Chenxi Du
Hui Zhang
Yuan Zhang
Ajian Liu
Jun Wan
Zhen Lei
AAML
82
4
0
23 Aug 2024
A Unified Framework for Iris Anti-Spoofing: Introducing Iris Anti-Spoofing Cross-Domain-Testing Protocol and Masked-MoE Method
A Unified Framework for Iris Anti-Spoofing: Introducing Iris Anti-Spoofing Cross-Domain-Testing Protocol and Masked-MoE Method
Hang Zou
Chenxi Du
Ajian Liu
Yuan Zhang
Jing Liu
Mingchuan Yang
Jun Wan
Hui Zhang
Zhenan Sun
73
0
0
19 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
155
2
0
13 Aug 2024
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed
  Image Restoration
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
Yulin Ren
Xin Li
Bingchen Li
Xingrui Wang
Mengxi Guo
Shijie Zhao
Li Zhang
Zhibo Chen
DiffM
129
7
0
15 Jul 2024
Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation
Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation
Szymon Płotka
Maciej Chrabaszcz
Przemyslaw Biecek
62
2
0
10 Jul 2024
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoE
66
3
0
28 Jun 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Tingting Gao
Xi Li
MoE
160
2
0
28 Jun 2024
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Haoze Wu
Zihan Qiu
Zili Wang
Hang Zhao
Jie Fu
MoE
100
3
0
18 Jun 2024
12
Next