v1v2 (latest)

From Sparse to Soft Mixtures of Experts

2 August 2023

Papers citing "From Sparse to Soft Mixtures of Experts"

50 / 90 papers shown

Title
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning Guozheng Ma Lu Li Zilin Wang Li Shen Pierre-Luc Bacon Dacheng Tao OffRL 27 0 0 20 Jun 2025
Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models Daniel Fidel Harvey George Weale Berk Yilmaz MoE 19 0 0 19 Jun 2025
NaSh: Guardrails for an LLM-Powered Natural Language Shell Bimal Raj Gyawali Saikrishna Achalla Konstantinos Kallas Sam Kumar 22 0 0 16 Jun 2025
Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control Rongpeng Li Jianhang Zhu Jiahao Huang Zhifeng Zhao Honggang Zhang 28 0 0 14 Jun 2025
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders James Oldfield Shawn Im Yixuan Li M. Nicolaou Ioannis Patras Grigorios G. Chrysos MoE 58 0 0 27 May 2025
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning Ghada Sokar Pablo Samuel Castro 59 0 0 23 May 2025
Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection Haokai Zhang Shengtao Zhang Zijian Cai Heng Wang Ruixuan Zhu Zinan Zeng Minnan Luo 138 0 0 24 Apr 2025
Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection Bo Lin Shangwen Wang Yihao Qin Liqian Chen Xiaoguang Mao SILM 59 0 0 23 Apr 2025
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming Zhiqiang He Zhi Liu 88 0 0 14 Apr 2025
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges Joshua Fixelle ViT 86 2 0 11 Apr 2025
Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection Shunxin Chen Ajian Liu Junze Zheng Jun Wan Kailai Peng Sergio Escalera Zhen Lei AAML 121 0 0 01 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs M. Ferrag Norbert Tihanyi Merouane Debbah ELM OffRL LRM AI4CE 442 4 0 26 Mar 2025
The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas Giovanni Franco Gabriel Marraffini Andrés Cotton Noe Fabian Hsueh Axel Fridman Juan Wisznia Luciano Del Corro 72 3 0 25 Mar 2025
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking Wenrui Cai Qingjie Liu Yansen Wang MoE 158 0 0 24 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Siyuan Mu Sen Lin MoE 500 5 0 10 Mar 2025
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs Umberto Cappellazzo Minsu Kim Stavros Petridis 113 1 0 09 Mar 2025
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models Y. Huang Peng Ye Chenyu Huang Jianjian Cao Lin Zhang Baopu Li Gang Yu Tao Chen MoMe MoE 90 3 0 03 Mar 2025
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks Loukas Ilias D. Askounis 70 1 0 27 Feb 2025
Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM Junxiao Ma Jingjing Wang Jiamin Luo Peiying Yu Guodong Zhou 121 1 0 26 Feb 2025
Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos Jiamin Luo Jingjing Wang Junxiao Ma Yujie Jin Shoushan Li Guodong Zhou 92 0 0 26 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models Raeid Saqur Anastasis Kratsios Florian Krach Yannick Limmer Jacob-Junqi Tian John Willes Blanka Horvath Frank Rudzicz MoE 152 0 0 24 Feb 2025
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE Andrei Chernov Oleg Novitskij 148 0 0 24 Feb 2025
Tight Clusters Make Specialized Experts Stefan K. Nielsen R. Teo Laziz U. Abdullaev Tan M. Nguyen MoE 145 4 0 21 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM Qingshui Gu Shu Li Tianyu Zheng Zhaoxiang Zhang 524 0 0 10 Feb 2025
(GG) MoE vs. MLP on Tabular Data Andrei Chernov BDL MoE 158 1 0 05 Feb 2025
A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis Binyu Zhang Shichao Li Junpeng Jian Zhu Meng Limei Guo Zhicheng Zhao 56 0 0 13 Jan 2025
OneLLM: One Framework to Align All Modalities with Language Jiaming Han Kaixiong Gong Yiyuan Zhang Jiaqi Wang Kaipeng Zhang Dahua Lin Yu Qiao Peng Gao Xiangyu Yue MLLM 254 134 0 10 Jan 2025
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin Zhilu Zhang Wenbo Li Renjing Pei Hang Xu Hongzhi Zhang Wangmeng Zuo 118 1 0 28 Dec 2024
Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir Zongwei Wu Nancy Mehta Yuedong Tan Danda Pani Paudel Yulun Zhang Radu Timofte MoE 540 5 0 27 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Jiange Yang Haoyi Zhu Yanjie Wang Gangshan Wu Tong He Limin Wang 204 3 0 21 Nov 2024
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts Jie Zhu Yukang Chen Mingyu Ding Ping Luo Leye Wang Jingdong Wang DiffM 69 5 0 30 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts R. Teo Tan M. Nguyen MoE 94 3 0 18 Oct 2024
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention Pedram Akbarian Huy Le Nguyen Xing Han Nhat Ho MoE 81 3 0 15 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention Peng Jin Bo Zhu Li Yuan Shuicheng Yan MoE 105 19 0 15 Oct 2024
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift Yanru Sun Zongxia Xie Emadeldeen Eldele Dongyue Chen Q. Hu Min-man Wu AI4TS 62 2 0 13 Oct 2024
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization Hongtao Wu Yijun Yang Angelica I Aviles-Rivero Jingjing Ren Sixiang Chen Haoyu Chen Lei Zhu 86 0 0 10 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models Cong Guo Feng Cheng Zhixu Du James Kiessling Jonathan Ku ... Qilin Zheng Guanglei Zhou Hai Li-Wei Li Yiran Chen 67 7 0 08 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts Minh Le Chau Nguyen Huy Nguyen Quyen Tran Trung Le Nhat Ho 137 8 0 03 Oct 2024
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing Haotian Sun Tao Lei Bowen Zhang Yanghao Li Haoshuo Huang Ruoming Pang Bo Dai Nan Du DiffM MoE 213 9 0 02 Oct 2024
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL Ghada Sokar J. Obando-Ceron Rameswar Panda Hugo Larochelle Pablo Samuel Castro MoE 338 7 0 02 Oct 2024
Mastering Chess with a Transformer Model Daniel Monroe The Leela Chess Zero Team 73 3 0 18 Sep 2024
Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts Youngseog Chung Dhruv Malik J. Schneider Yuanzhi Li Aarti Singh MoE 115 1 0 02 Sep 2024
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection Hang Zou Chenxi Du Hui Zhang Yuan Zhang Ajian Liu Jun Wan Zhen Lei AAML 82 4 0 23 Aug 2024
A Unified Framework for Iris Anti-Spoofing: Introducing Iris Anti-Spoofing Cross-Domain-Testing Protocol and Masked-MoE Method Hang Zou Chenxi Du Ajian Liu Yuan Zhang Jing Liu Mingchuan Yang Jun Wan Hui Zhang Zhenan Sun 73 0 0 19 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts Zihan Qiu Zeyu Huang Shuang Cheng Yizhi Zhou Zili Wang Ivan Titov Jie Fu MoE 155 2 0 13 Aug 2024
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration Yulin Ren Xin Li Bingchen Li Xingrui Wang Mengxi Guo Shijie Zhao Li Zhang Zhibo Chen DiffM 129 7 0 15 Jul 2024
Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation Szymon Płotka Maciej Chrabaszcz Przemyslaw Biecek 62 2 0 10 Jul 2024
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR Qiuming Zhao Guangzhi Sun Chao Zhang Mingxing Xu Thomas Fang Zheng MoE 66 3 0 28 Jun 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Longrong Yang Dong Shen Chaoxiang Cai Fan Yang Size Li Tingting Gao Xi Li MoE 160 2 0 28 Jun 2024
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory Haoze Wu Zihan Qiu Zili Wang Hang Zhao Jie Fu MoE 100 3 0 18 Jun 2024