ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.16716
  4. Cited By
BASE Layers: Simplifying Training of Large, Sparse Models

BASE Layers: Simplifying Training of Large, Sparse Models

30 March 2021
M. Lewis
Shruti Bhosale
Tim Dettmers
Naman Goyal
Luke Zettlemoyer
    MoE
ArXivPDFHTML

Papers citing "BASE Layers: Simplifying Training of Large, Sparse Models"

50 / 208 papers shown
Title
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual
  Pre-training
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
Tong Zhu
Xiaoye Qu
Daize Dong
Jiacheng Ruan
Jingqi Tong
Conghui He
Yu Cheng
MoE
ALM
54
73
0
24 Jun 2024
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts
  Language Models
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
Zihao Zeng
Yibo Miao
Hongcheng Gao
Hao Zhang
Zhijie Deng
MoE
52
8
0
19 Jun 2024
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Haoze Wu
Zihan Qiu
Zili Wang
Hang Zhao
Jie Fu
MoE
51
3
0
18 Jun 2024
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with
  Sparse Mixture-of-Experts
MoE-RBench\texttt{MoE-RBench}MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
83
5
0
17 Jun 2024
Flexible and Adaptable Summarization via Expertise Separation
Flexible and Adaptable Summarization via Expertise Separation
Preslav Nakov
Mingzhe Li
Shen Gao
Xin Cheng
Qingqing Zhu
Rui Yan
Xin Gao
Xiangliang Zhang
MoE
44
3
0
08 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for
  Vision Tasks
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
48
5
0
07 Jun 2024
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors
Renzhi Wang
Piji Li
KELM
42
3
0
29 May 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse
  Mixture-of-Experts
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
Kaoutar El Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
39
4
0
26 May 2024
Mixture of In-Context Prompters for Tabular PFNs
Mixture of In-Context Prompters for Tabular PFNs
Derek Xu
Olcay Cirit
Reza Asadi
Yizhou Sun
Wei Wang
41
9
0
25 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
46
30
0
18 May 2024
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Jiachen Li
Xinyao Wang
Sijie Zhu
Chia-Wen Kuo
Lu Xu
Fan Chen
Jitesh Jain
Humphrey Shi
Longyin Wen
MLLM
MoE
46
29
0
09 May 2024
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive
  Language Model Pre-training
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong
Mengzhou Xia
Danqi Chen
Mike Lewis
MoE
57
16
0
06 May 2024
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Jianan Zhou
Zhiguang Cao
Yaoxin Wu
Wen Song
Yining Ma
Jie Zhang
Chi Xu
59
19
0
02 May 2024
Integration of Mixture of Experts and Multimodal Generative AI in
  Internet of Vehicles: A Survey
Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A Survey
Minrui Xu
Dusit Niyato
Jiawen Kang
Zehui Xiong
Abbas Jamalipour
Yuguang Fang
Dong In Kim
Xuemin
X. Shen
28
5
0
25 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
46
86
0
22 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
44
15
0
12 Apr 2024
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face
  Forgery Detection
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection
Chenqi Kong
Anwei Luo
Song Xia
Yi Yu
Haoliang Li
Zengwei Zheng
Shiqi Wang
Alex C. Kot
MoE
36
6
0
12 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of
  Mixture-of-Experts Language Models
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Songlin Yang
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Yikang Shen
MoE
46
19
0
08 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
62
7
0
07 Apr 2024
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse
  Mixture-of-Experts
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
Byeongjun Park
Hyojun Go
Jin-Young Kim
Sangmin Woo
Seokil Ham
Changick Kim
DiffM
MoE
66
13
0
14 Mar 2024
Unleashing the Power of Meta-tuning for Few-shot Generalization Through
  Sparse Interpolated Experts
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
Shengzhuang Chen
Jihoon Tack
Yunqiao Yang
Yee Whye Teh
Jonathan Richard Schwarz
Ying Wei
MoE
43
1
0
13 Mar 2024
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Sainbayar Sukhbaatar
O. Yu. Golovneva
Vasu Sharma
Hu Xu
Xi Lin
...
Jacob Kahn
Shang-Wen Li
Wen-tau Yih
Jason Weston
Xian Li
MoMe
OffRL
MoE
45
62
0
12 Mar 2024
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
Quzhe Huang
Zhenwei An
Zhuang Nan
Mingxu Tao
Chen Zhang
...
Kun Xu
Kun Xu
Liwei Chen
Songfang Huang
Yansong Feng
MoE
42
26
0
12 Mar 2024
A Question-centric Multi-experts Contrastive Learning Framework for
  Improving the Accuracy and Interpretability of Deep Sequential Knowledge
  Tracing Models
A Question-centric Multi-experts Contrastive Learning Framework for Improving the Accuracy and Interpretability of Deep Sequential Knowledge Tracing Models
Hengyuan Zhang
Zitao Liu
Chenming Shang
Dawei Li
Yong Jiang
AI4Ed
52
8
0
12 Mar 2024
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
  Models
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Wenfeng Feng
Chuzhan Hao
Yuewei Zhang
Yu Han
Hao Wang
ALM
MoE
42
11
0
06 Mar 2024
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
Yuanhang Yang
Shiyi Qi
Wenchao Gu
Chaozheng Wang
Cuiyun Gao
Zenglin Xu
MoE
27
8
0
27 Feb 2024
Towards an empirical understanding of MoE design choices
Towards an empirical understanding of MoE design choices
Dongyang Fan
Bettina Messmer
Martin Jaggi
33
10
0
20 Feb 2024
MoELoRA: Contrastive Learning Guided Mixture of Experts on
  Parameter-Efficient Fine-Tuning for Large Language Models
MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models
Tongxu Luo
Jiahe Lei
Fangyu Lei
Weihao Liu
Shizhu He
Jun Zhao
Kang Liu
MoE
ALM
37
19
0
20 Feb 2024
Multilinear Mixture of Experts: Scalable Expert Specialization through
  Factorization
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
James Oldfield
Markos Georgopoulos
Grigorios G. Chrysos
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Jiankang Deng
Ioannis Patras
MoE
45
8
0
19 Feb 2024
Turn Waste into Worth: Rectifying Top-$k$ Router of MoE
Turn Waste into Worth: Rectifying Top-kkk Router of MoE
Zhiyuan Zeng
Qipeng Guo
Zhaoye Fei
Zhangyue Yin
Yunhua Zhou
Linyang Li
Tianxiang Sun
Hang Yan
Dahua Lin
Xipeng Qiu
MoE
MoMe
38
4
0
17 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
43
48
0
15 Feb 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Mixtures of Experts Unlock Parameter Scaling for Deep RL
J. Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob N. Foerster
Gintare Karolina Dziugaite
Doina Precup
Pablo Samuel Castro
63
31
0
13 Feb 2024
Scaling Laws for Fine-Grained Mixture of Experts
Scaling Laws for Fine-Grained Mixture of Experts
Jakub Krajewski
Jan Ludziejewski
Kamil Adamczewski
Maciej Pióro
Michal Krutul
...
Krystian Król
Tomasz Odrzygó'zd'z
Piotr Sankowski
Marek Cygan
Sebastian Jaszczur
MoE
56
54
0
12 Feb 2024
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
  Experts
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts
Zhili Liu
Kai Chen
Jianhua Han
Lanqing Hong
Hang Xu
Zhenguo Li
James T. Kwok
MoE
117
24
0
08 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
41
30
0
05 Feb 2024
InterpretCC: Intrinsic User-Centric Interpretability through Global
  Mixture of Experts
InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts
Vinitra Swamy
Syrielle Montariol
Julian Blackwell
Jibril Frej
Martin Jaggi
Tanja Käser
48
3
0
05 Feb 2024
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
  Competition
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Quang Pham
Giang Do
Huy Nguyen
TrungTin Nguyen
Chenghao Liu
...
Binh T. Nguyen
Savitha Ramasamy
Xiaoli Li
Steven C. H. Hoi
Nhat Ho
30
18
0
04 Feb 2024
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
Anke Tang
Li Shen
Yong Luo
Nan Yin
Lefei Zhang
Dacheng Tao
MoMe
41
41
0
01 Feb 2024
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the
  Experts
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
Zhitian Xie
Yinger Zhang
Chenyi Zhuang
Qitao Shi
Zhining Liu
Jinjie Gu
Guannan Zhang
MoE
43
3
0
31 Jan 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
36
89
0
29 Jan 2024
Routers in Vision Mixture of Experts: An Empirical Study
Routers in Vision Mixture of Experts: An Empirical Study
Tianlin Liu
Mathieu Blondel
C. Riquelme
J. Puigcerver
MoE
46
3
0
29 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
32
11
0
25 Jan 2024
Fast Inference of Mixture-of-Experts Language Models with Offloading
Fast Inference of Mixture-of-Experts Language Models with Offloading
Artyom Eliseev
Denis Mazur
MoE
19
43
0
28 Dec 2023
Some things are more CRINGE than others: Iterative Preference
  Optimization with the Pairwise Cringe Loss
Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss
Jing Xu
Andrew Lee
Sainbayar Sukhbaatar
Jason Weston
29
86
0
27 Dec 2023
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity
  Compensation
PanGu-πππ: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
73
17
0
27 Dec 2023
Adaptive Computation Modules: Granular Conditional Computation For
  Efficient Inference
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
Bartosz Wójcik
Alessio Devoto
Karol Pustelnik
Pasquale Minervini
Simone Scardapane
30
5
0
15 Dec 2023
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
28
14
0
13 Dec 2023
Learning to Skip for Language Modeling
Learning to Skip for Language Modeling
Dewen Zeng
Nan Du
Tao Wang
Yuanzhong Xu
Tao Lei
Zhifeng Chen
Claire Cui
25
11
0
26 Nov 2023
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
Do Huu Dat
Po Yuan Mao
Tien Hoang Nguyen
Wray Buntine
Bennamoun
59
1
0
23 Nov 2023
SiRA: Sparse Mixture of Low Rank Adaptation
SiRA: Sparse Mixture of Low Rank Adaptation
Yun Zhu
Nevan Wichers
Chu-Cheng Lin
Xinyi Wang
Tianlong Chen
...
Han Lu
Canoee Liu
Liangchen Luo
Jindong Chen
Lei Meng
MoE
35
27
0
15 Nov 2023
Previous
12345
Next