ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.15841
  4. Cited By
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

29 November 2022
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
    MoE
ArXivPDFHTML

Papers citing "MegaBlocks: Efficient Sparse Training with Mixture-of-Experts"

28 / 78 papers shown
Title
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
55
7
0
07 Apr 2024
Generative AI for Immersive Communication: The Next Frontier in
  Internet-of-Senses Through 6G
Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G
Nassim Sehad
Lina Bariah
W. Hamidouche
Hamed Hellaoui
Riku Jäntti
Mérouane Debbah
25
15
0
02 Apr 2024
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard
Shamane Siriwardhana
Malikeh Ehghaghi
Luke Meyers
Vladimir Karpukhin
Brian Benedict
Mark McQuade
Jacob Solawetz
MoMe
KELM
90
80
0
20 Mar 2024
Are LLMs Good Cryptic Crossword Solvers?
Are LLMs Good Cryptic Crossword Solvers?
Abdelrahman Boda
Daria Kotova
Ekaterina Kochmar
24
3
0
15 Mar 2024
Scattered Mixture-of-Experts Implementation
Scattered Mixture-of-Experts Implementation
Shawn Tan
Songlin Yang
Yikang Shen
Aaron Courville
MoE
30
8
0
13 Mar 2024
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
T. Yasuda
Kyriakos Axiotis
Gang Fu
M. Bateni
Vahab Mirrokni
47
0
0
27 Feb 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
  Mixture-of-Experts Large Language Models
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Qi Liu
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo-Wen Zhang
Junchi Yan
Hongsheng Li
MoE
32
26
0
22 Feb 2024
Multilinear Mixture of Experts: Scalable Expert Specialization through
  Factorization
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
James Oldfield
Markos Georgopoulos
Grigorios G. Chrysos
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Jiankang Deng
Ioannis Patras
MoE
45
8
0
19 Feb 2024
Turn Waste into Worth: Rectifying Top-$k$ Router of MoE
Turn Waste into Worth: Rectifying Top-kkk Router of MoE
Zhiyuan Zeng
Qipeng Guo
Zhaoye Fei
Zhangyue Yin
Yunhua Zhou
Linyang Li
Tianxiang Sun
Hang Yan
Dahua Lin
Xipeng Qiu
MoE
MoMe
33
4
0
17 Feb 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Mixtures of Experts Unlock Parameter Scaling for Deep RL
J. Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob N. Foerster
Gintare Karolina Dziugaite
Doina Precup
Pablo Samuel Castro
58
29
0
13 Feb 2024
Buffer Overflow in Mixture of Experts
Buffer Overflow in Mixture of Experts
Jamie Hayes
Ilia Shumailov
Itay Yona
MoE
14
6
0
08 Feb 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
36
88
0
29 Jan 2024
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical
  Assistance
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Huanjun Kong
Songyang Zhang
Jiaying Li
Min Xiao
Jun Xu
Kai-xiang Chen
VLM
31
1
0
16 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
22
988
0
08 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
67
76
0
23 Dec 2023
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Dahyun Kim
Chanjun Park
Sanghoon Kim
Wonsung Lee
Wonho Song
...
Hyunbyung Park
Gyoungjin Gim
Mikyoung Cha
Hwalsuk Lee
Sunghun Kim
ALM
ELM
35
135
0
23 Dec 2023
Memory Augmented Language Models through Mixture of Word Experts
Memory Augmented Language Models through Mixture of Word Experts
Cicero Nogueira dos Santos
James Lee-Thorp
Isaac Noble
Chung-Ching Chang
David C. Uthus
MoE
25
8
0
15 Nov 2023
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel
  Max Series GPU
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU
Mohammad Zubair
Christoph Bauinger
14
0
0
01 Nov 2023
MOSEL: Inference Serving Using Dynamic Modality Selection
MOSEL: Inference Serving Using Dynamic Modality Selection
Bodun Hu
Le Xu
Jeongyoon Moon
N. Yadwadkar
Aditya Akella
13
4
0
27 Oct 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
29
24
0
25 Oct 2023
Adaptive Gating in Mixture-of-Experts based Language Models
Adaptive Gating in Mixture-of-Experts based Language Models
Jiamin Li
Qiang Su
Yitao Yang
Yimin Jiang
Cong Wang
Hong-Yu Xu
MoE
35
5
0
11 Oct 2023
JaxPruner: A concise library for sparsity research
JaxPruner: A concise library for sparsity research
Jooyoung Lee
Wonpyo Park
Nicole Mitchell
Jonathan Pilault
J. Obando-Ceron
...
Hong-Seok Kim
Yann N. Dauphin
Karolina Dziugaite
Pablo Samuel Castro
Utku Evci
46
14
0
27 Apr 2023
PopSparse: Accelerated block sparse matrix multiplication on IPU
PopSparse: Accelerated block sparse matrix multiplication on IPU
Zhiyi Li
Douglas Orr
V. Ohan
Godfrey Da Costa
Tom Murray
Adam Sanders
D. Beker
Dominic Masters
27
1
0
29 Mar 2023
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation
  Invariant Transformation
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Ningxin Zheng
Huiqiang Jiang
Quan Zhang
Zhenhua Han
Yuqing Yang
...
Fan Yang
Chengruidong Zhang
Lili Qiu
Mao Yang
Lidong Zhou
42
27
0
26 Jan 2023
Tutel: Adaptive Mixture-of-Experts at Scale
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
106
111
0
07 Jun 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
329
0
18 Feb 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
1,996
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
Previous
12