Theory on Mixture-of-Experts in Continual Learning

20 February 2025

Papers citing "Theory on Mixture-of-Experts in Continual Learning"

47 / 47 papers shown

Title
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming Zhiqiang He Zhi Liu 76 0 0 14 Apr 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Siyuan Mu Sen Lin MoE 465 5 0 10 Mar 2025
Mechanism Design for Blockchain Order Books against Selfish Miners Yunshu Liu Lingjie Duan 99 0 0 22 Jan 2025
Generate to Discriminate: Expert Routing for Continual Learning Yewon Byun Sanket Vaibhav Mehta Saurabh Garg Emma Strubell Michael Oberst Bryan Wilder Zachary Chase Lipton 140 0 0 31 Dec 2024
Algorithm Design for Continual Learning in IoT Networks Shugang Hao Lingjie Duan CLL 131 0 0 22 Dec 2024
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning Prateek Yadav Colin Raffel Mohammed Muqeeth Lucas Caccia Haokun Liu Tianlong Chen Joey Tianyi Zhou Leshem Choshen Alessandro Sordoni MoMe 92 24 0 13 Aug 2024
Mixture of A Million Experts Xu Owen He MoE 81 31 0 04 Jul 2024
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters Jiazuo Yu Yunzhi Zhuge Lu Zhang Ping Hu Dong Wang Huchuan Lu You He VLM KELM CLL OODD 168 85 0 18 Mar 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Fuzhao Xue Zian Zheng Yao Fu Jinjie Ni Zangwei Zheng Wangchunshu Zhou Yang You MoE 77 99 0 29 Jan 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Bin Lin Zhenyu Tang Yang Ye Jiaxi Cui Bin Zhu ... Jinfa Huang Junwu Zhang Yatian Pang Munan Ning Li-ming Yuan VLM MLLM MoE 98 169 0 29 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training Jing Li Zhijie Sun Xuan He Li Zeng Yi Lin Entong Li Binfan Zheng Rongqian Zhao Xin Chen MoE 84 13 0 25 Jan 2024
Divide and not forget: Ensemble of selectively trained experts in Continual Learning Grzegorz Rype'sć Sebastian Cygert Valeriya Khan Tomasz Trzciñski Bartosz Zieliñski Bartlomiej Twardowski CLL 62 31 0 18 Jan 2024
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning Ted Zadouri Ahmet Üstün Arash Ahmadian Beyza Ermics Acyr Locatelli Sara Hooker MoE 71 98 0 11 Sep 2023
Parameter-Level Soft-Masking for Continual Learning Tatsuya Konishi M. Kurokawa C. Ono Zixuan Ke Gyuhak Kim Bin Liu CLL 54 37 0 26 Jun 2023
The Ideal Continual Learner: An Agent That Never Forgets Liangzu Peng Paris V. Giampouras René Vidal CLL 151 30 0 29 Apr 2023
Theory on Forgetting and Generalization of Continual Learning Sen Lin Peizhong Ju Yitao Liang Ness B. Shroff CLL 79 45 0 12 Feb 2023
A Comprehensive Survey of Continual Learning: Theory, Method and Application Liyuan Wang Xingxing Zhang Hang Su Jun Zhu KELM CLL 168 683 0 31 Jan 2023
CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One Liyuan Wang Xingxing Zhang Qian Li Jun Zhu Yi Zhong CLL 64 49 0 13 Jul 2022
How catastrophic can catastrophic forgetting be in linear regression? Itay Evron E. Moroshko Rachel A. Ward Nati Srebro Daniel Soudry CLL 73 52 0 19 May 2022
On the Representation Collapse of Sparse Mixture of Experts Zewen Chi Li Dong Shaohan Huang Damai Dai Shuming Ma ... Payal Bajaj Xia Song Xian-Ling Mao Heyan Huang Furu Wei MoMe MoE 71 105 0 20 Apr 2022
Continual Learning Beyond a Single Model T. Doan Seyed Iman Mirzadeh Mehrdad Farajtabar CLL 62 16 0 20 Feb 2022
Mixture-of-Experts with Expert Choice Routing Yan-Quan Zhou Tao Lei Han-Chu Liu Nan Du Yanping Huang Vincent Zhao Andrew M. Dai Zhifeng Chen Quoc V. Le James Laudon MoE 298 358 0 18 Feb 2022
TRGP: Trust Region Gradient Projection for Continual Learning Sen Lin Li Yang Deliang Fan Junshan Zhang CLL 127 77 0 07 Feb 2022
Continual Learning with Recursive Gradient Optimization Hao Liu Huaping Liu VLM CLL 129 37 0 29 Jan 2022
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts Nan Du Yanping Huang Andrew M. Dai Simon Tong Dmitry Lepikhin ... Kun Zhang Quoc V. Le Yonghui Wu Zhiwen Chen Claire Cui ALM MoE 216 813 0 13 Dec 2021
Specializing Versatile Skill Libraries using Local Mixture of Experts Onur Celik Dongzhuoran Zhou Gen Li P. Becker Gerhard Neumann 61 37 0 08 Dec 2021
Mixture-of-Variational-Experts for Continual Learning Y. Yin Yu Wang CLL FedML 46 6 0 25 Oct 2021
Continual Learning in the Teacher-Student Setup: Impact of Task Similarity Sebastian Lee Sebastian Goldt Andrew M. Saxe CLL 72 74 0 09 Jul 2021
Scaling Vision with Sparse Mixture of Experts C. Riquelme J. Puigcerver Basil Mustafa Maxim Neumann Rodolphe Jenatton André Susano Pinto Daniel Keysers N. Houlsby MoE 112 606 0 10 Jun 2021
Layerwise Optimization by Gradient Decomposition for Continual Learning Shixiang Tang Dapeng Chen Jinguo Zhu Shijie Yu Wanli Ouyang CLL 64 65 0 17 May 2021
Gradient Projection Memory for Continual Learning Gobinda Saha Isha Garg Kaushik Roy VLM CLL 78 283 0 17 Mar 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity W. Fedus Barret Zoph Noam M. Shazeer MoE 88 2,187 0 11 Jan 2021
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix T. Doan Mehdi Abbana Bennani Bogdan Mazoure Guillaume Rabusseau Pierre Alquier CLL 70 83 0 07 Oct 2020
Gradient-based Editing of Memory Examples for Online Task-free Continual Learning Xisen Jin Arka Sadhu Junyi Du Xiang Ren CLL KELM BDL 64 98 0 27 Jun 2020
Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent Mehdi Abbana Bennani Thang Doan Masashi Sugiyama CLL 87 62 0 21 Jun 2020
Knowledge Distillation: A Survey Jianping Gou B. Yu Stephen J. Maybank Dacheng Tao VLM 105 2,960 0 09 Jun 2020
Orthogonal Gradient Descent for Continual Learning Mehrdad Farajtabar Navid Azizan Alex Mott Ang Li CLL 96 369 0 15 Oct 2019
Scalable and Order-robust Continual Learning with Additive Parameter Decomposition Jaehong Yoon Saehoon Kim Eunho Yang Sung Ju Hwang CLL 70 177 0 25 Feb 2019
Efficient Lifelong Learning with A-GEM Arslan Chaudhry MarcÁurelio Ranzato Marcus Rohrbach Mohamed Elhoseiny CLL 210 1,456 0 02 Dec 2018
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting H. Ritter Aleksandar Botev David Barber BDL CLL 86 331 0 20 May 2018
Characterizing Implicit Bias in Terms of Optimization Geometry Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro AI4CE 73 410 0 22 Feb 2018
Continual Lifelong Learning with Neural Networks: A Review G. I. Parisi Ronald Kemker Jose L. Part Christopher Kanan S. Wermter KELM CLL 193 2,888 0 21 Feb 2018
Overcoming catastrophic forgetting with hard attention to the task Joan Serrà Dídac Surís M. Miron Alexandros Karatzoglou CLL 106 1,079 0 04 Jan 2018
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Noam M. Shazeer Azalia Mirhoseini Krzysztof Maziarz Andy Davis Quoc V. Le Geoffrey E. Hinton J. Dean MoE 251 2,653 0 23 Jan 2017
Overcoming catastrophic forgetting in neural networks J. Kirkpatrick Razvan Pascanu Neil C. Rabinowitz J. Veness Guillaume Desjardins ... A. Grabska-Barwinska Demis Hassabis Claudia Clopath D. Kumaran R. Hadsell CLL 369 7,518 0 02 Dec 2016
Learning Factored Representations in a Deep Mixture of Experts David Eigen MarcÁurelio Ranzato Ilya Sutskever MoE 84 374 0 16 Dec 2013
Tensor decompositions for learning latent variable models Anima Anandkumar Rong Ge Daniel J. Hsu Sham Kakade Matus Telgarsky 440 1,145 0 29 Oct 2012