ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.15871
  4. Cited By
Hiding Communication Cost in Distributed LLM Training via Micro-batch
  Co-execution

Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution

24 November 2024
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
ArXiv (abs)PDFHTML

Papers citing "Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution"

14 / 14 papers shown
Title
Optimizing Large Model Training through Overlapped Activation Recomputation
Optimizing Large Model Training through Overlapped Activation Recomputation
Ping Chen
Wenjie Zhang
Shuibing He
Yingjie Gu
Zhuwei Peng
...
Yi Zheng
Zhefeng Wang
Yanlong Yin
Gang Chen
Gang Chen
95
6
0
13 Jun 2024
Efficient All-to-All Collective Communication Schedules for
  Direct-Connect Topologies
Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies
P. Basu
Liangyu Zhao
Jason Fantl
Siddharth Pal
Arvind Krishnamurthy
J. Khoury
58
7
0
24 Sep 2023
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed
  Machine Learning
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning
William Won
Suvinay Subramanian
Sudarshan Srinivasan
A. Durg
Samvit Kaul
Swati Gupta
Tushar Krishna
65
7
0
11 Apr 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
59
27
0
24 Jan 2023
Reducing Activation Recomputation in Large Transformer Models
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
Mohammad Shoeybi
Bryan Catanzaro
AI4CE
125
273
0
10 May 2022
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text
  Models
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Jianmo Ni
Gustavo Hernández Ábrego
Noah Constant
Ji Ma
Keith B. Hall
Daniel Cer
Yinfei Yang
220
564
0
19 Aug 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
122
699
0
09 Apr 2021
FastMoE: A Fast Mixture-of-Expert Training System
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALMMoE
86
101
0
24 Mar 2021
Automatic Horizontal Fusion for GPU Kernels
Automatic Horizontal Fusion for GPU Kernels
Ao Li
Bojian Zheng
Gennady Pekhimenko
Fan Long
72
46
0
02 Jul 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
877
42,379
0
28 May 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
336
1,917
0
17 Sep 2019
Priority-based Parameter Propagation for Distributed DNN Training
Priority-based Parameter Propagation for Distributed DNN Training
Anand Jayarajan
Jinliang Wei
Garth A. Gibson
Alexandra Fedorova
Gennady Pekhimenko
AI4CE
55
181
0
10 May 2019
TicTac: Accelerating Distributed Deep Learning with Communication
  Scheduling
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
44
198
0
08 Mar 2018
Horovod: fast and easy distributed deep learning in TensorFlow
Horovod: fast and easy distributed deep learning in TensorFlow
Alexander Sergeev
Mike Del Balso
100
1,221
0
15 Feb 2018
1