Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.09762
Cited By
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
15 October 2023
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer"
30 / 30 papers shown
Title
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Andy Zhou
MoMe
133
0
0
13 Mar 2025
Spatio-Temporal Multi-Subgraph GCN for 3D Human Motion Prediction
Jiexin Wang
Yiju Guo
Fuchun Sun
3DH
104
1
0
03 Jan 2025
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks
Hao Sun
Li Shen
Qihuang Zhong
Liang Ding
Shi-Yong Chen
Jingwei Sun
Jing Li
Guangzhong Sun
Dacheng Tao
77
33
0
01 Mar 2023
Improved Training of Mixture-of-Experts Language GANs
Yekun Chai
Qiyue Yin
Junge Zhang
GAN
48
5
0
23 Feb 2023
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
Qihuang Zhong
Liang Ding
Yibing Zhan
Yu Qiao
Yonggang Wen
...
Yixin Chen
Xinbo Gao
Steven C. H. Hoi
Xiaoou Tang
Dacheng Tao
VLM
ELM
97
35
0
04 Dec 2022
On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation
Changtong Zan
Liang Ding
Li Shen
Yu Cao
Weifeng Liu
Dacheng Tao
64
21
0
07 Sep 2022
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
69
27
0
30 May 2022
A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis
Bing Wang
Liang Ding
Qihuang Zhong
Ximing Li
Dacheng Tao
59
32
0
16 Apr 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
298
358
0
18 Feb 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
106
298
0
14 Jan 2022
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALM
MoE
216
813
0
13 Dec 2021
Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning
Zixuan Ke
Bing-Quan Liu
Nianzu Ma
Hu Xu
Lei Shu
CLL
217
125
0
05 Dec 2021
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
112
606
0
10 Jun 2021
Hash Layers For Large Sparse Models
Stephen Roller
Sainbayar Sukhbaatar
Arthur Szlam
Jason Weston
MoE
181
210
0
08 Jun 2021
BASE Layers: Simplifying Training of Large, Sparse Models
M. Lewis
Shruti Bhosale
Tim Dettmers
Naman Goyal
Luke Zettlemoyer
MoE
194
281
0
30 Mar 2021
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALM
MoE
86
99
0
24 Mar 2021
Towards Efficiently Diversifying Dialogue Generation via Embedding Augmentation
Yu Cao
Liang Ding
Zhiliang Tian
Meng Fang
63
14
0
02 Mar 2021
Understanding and Improving Lexical Choice in Non-Autoregressive Translation
Liang Ding
Longyue Wang
Xuebo Liu
Derek F. Wong
Dacheng Tao
Zhaopeng Tu
139
77
0
29 Dec 2020
Efficient Meta Lifelong-Learning with Limited Memory
Zirui Wang
Sanket Vaibhav Mehta
Barnabás Póczós
J. Carbonell
CLL
KELM
70
76
0
06 Oct 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
103
1,165
0
30 Jun 2020
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
368
6,455
0
26 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
665
24,464
0
26 Jul 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
227
1,527
0
24 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
274
2,315
0
02 May 2019
Continual Learning of Context-dependent Processing in Neural Networks
Guanxiong Zeng
Yang Chen
Bo Cui
Shan Yu
CLL
82
308
0
29 Sep 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,182
0
20 Apr 2018
Encoder Based Lifelong Learning
Amal Rannen Triki
Rahaf Aljundi
Mathew B. Blaschko
Tinne Tuytelaars
CLL
100
321
0
06 Apr 2017
iCaRL: Incremental Classifier and Representation Learning
Sylvestre-Alvise Rebuffi
Alexander Kolesnikov
G. Sperl
Christoph H. Lampert
CLL
OOD
154
3,761
0
23 Nov 2016
Expert Gate: Lifelong Learning with a Network of Experts
Rahaf Aljundi
Punarjay Chakravarty
Tinne Tuytelaars
CLL
80
661
0
18 Nov 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
286
8,134
0
16 Jun 2016
1