Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.06643
Cited By
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
10 February 2025
Seokjin Go
Divya Mahajan
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing"
10 / 10 papers shown
Title
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
94
8
0
07 Apr 2024
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
48
37
0
02 Oct 2023
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
139
115
0
07 Jun 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
55
292
0
14 Jan 2022
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALM
MoE
67
98
0
24 Mar 2021
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
280
42,038
0
03 Dec 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
290
1,861
0
17 Sep 2019
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
168
2,814
0
26 Sep 2016
1