ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.16437
  4. Cited By
Theory on Mixture-of-Experts in Continual Learning

Theory on Mixture-of-Experts in Continual Learning

20 February 2025
Hongbo Li
Sen-Fon Lin
Lingjie Duan
Yingbin Liang
Ness B. Shroff
    MoEMoMeCLL
ArXiv (abs)PDFHTML

Papers citing "Theory on Mixture-of-Experts in Continual Learning"

47 / 47 papers shown
Title
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming
Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming
Zhiqiang He
Zhi Liu
76
0
0
14 Apr 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
465
5
0
10 Mar 2025
Mechanism Design for Blockchain Order Books against Selfish Miners
Mechanism Design for Blockchain Order Books against Selfish Miners
Yunshu Liu
Lingjie Duan
99
0
0
22 Jan 2025
Generate to Discriminate: Expert Routing for Continual Learning
Generate to Discriminate: Expert Routing for Continual Learning
Yewon Byun
Sanket Vaibhav Mehta
Saurabh Garg
Emma Strubell
Michael Oberst
Bryan Wilder
Zachary Chase Lipton
140
0
0
31 Dec 2024
Algorithm Design for Continual Learning in IoT Networks
Algorithm Design for Continual Learning in IoT Networks
Shugang Hao
Lingjie Duan
CLL
131
0
0
22 Dec 2024
A Survey on Model MoErging: Recycling and Routing Among Specialized
  Experts for Collaborative Learning
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
Prateek Yadav
Colin Raffel
Mohammed Muqeeth
Lucas Caccia
Haokun Liu
Tianlong Chen
Joey Tianyi Zhou
Leshem Choshen
Alessandro Sordoni
MoMe
92
24
0
13 Aug 2024
Mixture of A Million Experts
Mixture of A Million Experts
Xu Owen He
MoE
81
31
0
04 Jul 2024
Boosting Continual Learning of Vision-Language Models via
  Mixture-of-Experts Adapters
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Jiazuo Yu
Yunzhi Zhuge
Lu Zhang
Ping Hu
Dong Wang
Huchuan Lu
You He
VLMKELMCLLOODD
168
85
0
18 Mar 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
77
99
0
29 Jan 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLMMLLMMoE
98
169
0
29 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
84
13
0
25 Jan 2024
Divide and not forget: Ensemble of selectively trained experts in
  Continual Learning
Divide and not forget: Ensemble of selectively trained experts in Continual Learning
Grzegorz Rype'sć
Sebastian Cygert
Valeriya Khan
Tomasz Trzciñski
Bartosz Zieliñski
Bartlomiej Twardowski
CLL
62
31
0
18 Jan 2024
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient
  MoE for Instruction Tuning
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Ted Zadouri
Ahmet Üstün
Arash Ahmadian
Beyza Ermics
Acyr Locatelli
Sara Hooker
MoE
71
98
0
11 Sep 2023
Parameter-Level Soft-Masking for Continual Learning
Parameter-Level Soft-Masking for Continual Learning
Tatsuya Konishi
M. Kurokawa
C. Ono
Zixuan Ke
Gyuhak Kim
Bin Liu
CLL
54
37
0
26 Jun 2023
The Ideal Continual Learner: An Agent That Never Forgets
The Ideal Continual Learner: An Agent That Never Forgets
Liangzu Peng
Paris V. Giampouras
René Vidal
CLL
151
30
0
29 Apr 2023
Theory on Forgetting and Generalization of Continual Learning
Theory on Forgetting and Generalization of Continual Learning
Sen Lin
Peizhong Ju
Yitao Liang
Ness B. Shroff
CLL
79
45
0
12 Feb 2023
A Comprehensive Survey of Continual Learning: Theory, Method and
  Application
A Comprehensive Survey of Continual Learning: Theory, Method and Application
Liyuan Wang
Xingxing Zhang
Hang Su
Jun Zhu
KELMCLL
168
683
0
31 Jan 2023
CoSCL: Cooperation of Small Continual Learners is Stronger than a Big
  One
CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One
Liyuan Wang
Xingxing Zhang
Qian Li
Jun Zhu
Yi Zhong
CLL
64
49
0
13 Jul 2022
How catastrophic can catastrophic forgetting be in linear regression?
How catastrophic can catastrophic forgetting be in linear regression?
Itay Evron
E. Moroshko
Rachel A. Ward
Nati Srebro
Daniel Soudry
CLL
73
52
0
19 May 2022
On the Representation Collapse of Sparse Mixture of Experts
On the Representation Collapse of Sparse Mixture of Experts
Zewen Chi
Li Dong
Shaohan Huang
Damai Dai
Shuming Ma
...
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
MoMeMoE
71
105
0
20 Apr 2022
Continual Learning Beyond a Single Model
Continual Learning Beyond a Single Model
T. Doan
Seyed Iman Mirzadeh
Mehrdad Farajtabar
CLL
62
16
0
20 Feb 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
298
358
0
18 Feb 2022
TRGP: Trust Region Gradient Projection for Continual Learning
TRGP: Trust Region Gradient Projection for Continual Learning
Sen Lin
Li Yang
Deliang Fan
Junshan Zhang
CLL
127
77
0
07 Feb 2022
Continual Learning with Recursive Gradient Optimization
Continual Learning with Recursive Gradient Optimization
Hao Liu
Huaping Liu
VLMCLL
129
37
0
29 Jan 2022
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALMMoE
216
813
0
13 Dec 2021
Specializing Versatile Skill Libraries using Local Mixture of Experts
Specializing Versatile Skill Libraries using Local Mixture of Experts
Onur Celik
Dongzhuoran Zhou
Gen Li
P. Becker
Gerhard Neumann
61
37
0
08 Dec 2021
Mixture-of-Variational-Experts for Continual Learning
Mixture-of-Variational-Experts for Continual Learning
Y. Yin
Yu Wang
CLLFedML
46
6
0
25 Oct 2021
Continual Learning in the Teacher-Student Setup: Impact of Task
  Similarity
Continual Learning in the Teacher-Student Setup: Impact of Task Similarity
Sebastian Lee
Sebastian Goldt
Andrew M. Saxe
CLL
72
74
0
09 Jul 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
112
606
0
10 Jun 2021
Layerwise Optimization by Gradient Decomposition for Continual Learning
Layerwise Optimization by Gradient Decomposition for Continual Learning
Shixiang Tang
Dapeng Chen
Jinguo Zhu
Shijie Yu
Wanli Ouyang
CLL
64
65
0
17 May 2021
Gradient Projection Memory for Continual Learning
Gradient Projection Memory for Continual Learning
Gobinda Saha
Isha Garg
Kaushik Roy
VLMCLL
78
283
0
17 Mar 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
88
2,187
0
11 Jan 2021
A Theoretical Analysis of Catastrophic Forgetting through the NTK
  Overlap Matrix
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix
T. Doan
Mehdi Abbana Bennani
Bogdan Mazoure
Guillaume Rabusseau
Pierre Alquier
CLL
70
83
0
07 Oct 2020
Gradient-based Editing of Memory Examples for Online Task-free Continual
  Learning
Gradient-based Editing of Memory Examples for Online Task-free Continual Learning
Xisen Jin
Arka Sadhu
Junyi Du
Xiang Ren
CLLKELMBDL
64
98
0
27 Jun 2020
Generalisation Guarantees for Continual Learning with Orthogonal
  Gradient Descent
Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent
Mehdi Abbana Bennani
Thang Doan
Masashi Sugiyama
CLL
87
62
0
21 Jun 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
105
2,960
0
09 Jun 2020
Orthogonal Gradient Descent for Continual Learning
Orthogonal Gradient Descent for Continual Learning
Mehrdad Farajtabar
Navid Azizan
Alex Mott
Ang Li
CLL
96
369
0
15 Oct 2019
Scalable and Order-robust Continual Learning with Additive Parameter
  Decomposition
Scalable and Order-robust Continual Learning with Additive Parameter Decomposition
Jaehong Yoon
Saehoon Kim
Eunho Yang
Sung Ju Hwang
CLL
70
177
0
25 Feb 2019
Efficient Lifelong Learning with A-GEM
Efficient Lifelong Learning with A-GEM
Arslan Chaudhry
MarcÁurelio Ranzato
Marcus Rohrbach
Mohamed Elhoseiny
CLL
210
1,456
0
02 Dec 2018
Online Structured Laplace Approximations For Overcoming Catastrophic
  Forgetting
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting
H. Ritter
Aleksandar Botev
David Barber
BDLCLL
86
331
0
20 May 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
73
410
0
22 Feb 2018
Continual Lifelong Learning with Neural Networks: A Review
Continual Lifelong Learning with Neural Networks: A Review
G. I. Parisi
Ronald Kemker
Jose L. Part
Christopher Kanan
S. Wermter
KELMCLL
193
2,888
0
21 Feb 2018
Overcoming catastrophic forgetting with hard attention to the task
Overcoming catastrophic forgetting with hard attention to the task
Joan Serrà
Dídac Surís
M. Miron
Alexandros Karatzoglou
CLL
106
1,079
0
04 Jan 2018
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
251
2,653
0
23 Jan 2017
Overcoming catastrophic forgetting in neural networks
Overcoming catastrophic forgetting in neural networks
J. Kirkpatrick
Razvan Pascanu
Neil C. Rabinowitz
J. Veness
Guillaume Desjardins
...
A. Grabska-Barwinska
Demis Hassabis
Claudia Clopath
D. Kumaran
R. Hadsell
CLL
369
7,518
0
02 Dec 2016
Learning Factored Representations in a Deep Mixture of Experts
Learning Factored Representations in a Deep Mixture of Experts
David Eigen
MarcÁurelio Ranzato
Ilya Sutskever
MoE
84
374
0
16 Dec 2013
Tensor decompositions for learning latent variable models
Tensor decompositions for learning latent variable models
Anima Anandkumar
Rong Ge
Daniel J. Hsu
Sham Kakade
Matus Telgarsky
440
1,145
0
29 Oct 2012
1