ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.00875
  4. Cited By
MoE-CT: A Novel Approach For Large Language Models Training With
  Resistance To Catastrophic Forgetting

MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting

25 June 2024
Tianhao Li
Shangjie Li
Binbin Xie
Deyi Xiong
Baosong Yang
    CLL
ArXivPDFHTML

Papers citing "MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting"

21 / 21 papers shown
Title
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
868
12,916
0
04 Mar 2022
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALM
MoE
209
812
0
13 Dec 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
  Inference
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
245
109
0
24 Sep 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
398
10,301
0
17 Jun 2021
Continual Learning for Text Classification with Information
  Disentanglement Based Regularization
Continual Learning for Text Classification with Information Disentanglement Based Regularization
Yufan Huang
Yanzhe Zhang
Jiaao Chen
Xuezhi Wang
Diyi Yang
CLL
59
111
0
12 Apr 2021
Continual Lifelong Learning in Natural Language Processing: A Survey
Continual Lifelong Learning in Natural Language Processing: A Survey
Magdalena Biesialska
Katarzyna Biesialska
Marta R. Costa-jussá
KELM
CLL
81
219
0
17 Dec 2020
Meta-Learning with Sparse Experience Replay for Lifelong Language
  Learning
Meta-Learning with Sparse Experience Replay for Lifelong Language Learning
Nithin Holla
Pushkar Mishra
H. Yannakoudakis
Ekaterina Shutova
KELM
CLL
43
21
0
10 Sep 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
86
1,162
0
30 Jun 2020
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
Edoardo Ponti
Goran Glavaš
Olga Majewska
Qianchu Liu
Ivan Vulić
Anna Korhonen
LRM
61
321
0
01 May 2020
BatchEnsemble: An Alternative Approach to Efficient Ensemble and
  Lifelong Learning
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
Yeming Wen
Dustin Tran
Jimmy Ba
OOD
FedML
UQCV
162
492
0
17 Feb 2020
LAMOL: LAnguage MOdeling for Lifelong Language Learning
LAMOL: LAnguage MOdeling for Lifelong Language Learning
Fan-Keng Sun
Cheng-Hao Ho
Hung-yi Lee
CLL
KELM
85
208
0
07 Sep 2019
Efficient Lifelong Learning with A-GEM
Efficient Lifelong Learning with A-GEM
Arslan Chaudhry
MarcÁurelio Ranzato
Marcus Rohrbach
Mohamed Elhoseiny
CLL
199
1,453
0
02 Dec 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
204
11,546
0
15 Feb 2018
Deep Learning Scaling is Predictable, Empirically
Deep Learning Scaling is Predictable, Empirically
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
89
739
0
01 Dec 2017
Lifelong Learning with Dynamically Expandable Networks
Lifelong Learning with Dynamically Expandable Networks
Jaehong Yoon
Eunho Yang
Jeongtae Lee
Sung Ju Hwang
CLL
121
1,222
0
04 Aug 2017
Get To The Point: Summarization with Pointer-Generator Networks
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
285
4,019
0
14 Apr 2017
iCaRL: Incremental Classifier and Representation Learning
iCaRL: Incremental Classifier and Representation Learning
Sylvestre-Alvise Rebuffi
Alexander Kolesnikov
G. Sperl
Christoph H. Lampert
CLL
OOD
139
3,754
0
23 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
891
6,788
0
26 Sep 2016
Learning without Forgetting
Learning without Forgetting
Zhizhong Li
Derek Hoiem
CLL
OOD
SSL
292
4,402
0
29 Jun 2016
A Neural Attention Model for Abstractive Sentence Summarization
A Neural Attention Model for Abstractive Sentence Summarization
Alexander M. Rush
S. Chopra
Jason Weston
CVBM
182
2,700
0
02 Sep 2015
Learning Phrase Representations using RNN Encoder-Decoder for
  Statistical Machine Translation
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
1.0K
23,338
0
03 Jun 2014
1