Lancet: Accelerating Mixture-of-Experts Training via Whole Graph
Computation-Communication Overlapping

Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping

30 April 2024

Papers citing "Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping"

8 / 8 papers shown

Title
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication Athinagoras Skiadopoulos Mark Zhao Swapnil Gandhi Thomas Norrie Shrijeet Mukherjee Christos Kozyrakis MoE 91 0 0 28 Apr 2025
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing Seokjin Go Divya Mahajan MoE 69 0 0 10 Feb 2025
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Guanhua Wang Chengming Zhang Zheyu Shen Ang Li Olatunji Ruwase 36 3 0 23 Sep 2024
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement Yongji Wu Wenjie Qu Tianyang Tao Zhuang Wang Wei Bai Zhuohao Li Yuan Tian Jiaheng Zhang Matthew Lentz Danyang Zhuo 66 3 0 05 Jul 2024
Tutel: Adaptive Mixture-of-Experts at Scale Changho Hwang Wei Cui Yifan Xiong Ziyue Yang Ze Liu ... Joe Chau Peng Cheng Fan Yang Mao Yang Y. Xiong MoE 97 110 0 07 Jun 2022
Mixture-of-Experts with Expert Choice Routing Yan-Quan Zhou Tao Lei Han-Chu Liu Nan Du Yanping Huang Vincent Zhao Andrew M. Dai Zhifeng Chen Quoc V. Le James Laudon MoE 160 327 0 18 Feb 2022
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining Junyang Lin An Yang Jinze Bai Chang Zhou Le Jiang ... Jie Zhang Yong Li Wei Lin Jingren Zhou Hongxia Yang MoE 92 43 0 08 Oct 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019