Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.05124
Cited By
PipeMare: Asynchronous Pipeline Parallel DNN Training
9 October 2019
Bowen Yang
Jian Zhang
Jonathan Li
Christopher Ré
Christopher R. Aberger
Christopher De Sa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PipeMare: Asynchronous Pipeline Parallel DNN Training"
50 / 60 papers shown
Title
Nesterov Method for Asynchronous Pipeline Parallel Optimization
Thalaiyasingam Ajanthan
Sameera Ramasinghe
Yan Zuo
Gil Avraham
Alexander Long
26
0
0
02 May 2025
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Dongsheng Li
53
0
0
21 Jan 2025
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Li-Wen Chang
Yiyuan Ma
Qi Hou
Chengquan Jiang
Ningxin Zheng
...
Zuquan Song
Ziheng Jiang
Yanghua Peng
Xuanzhe Liu
Xin Liu
41
23
0
11 Jun 2024
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
31
7
0
05 Jun 2024
PETRA: Parallel End-to-end Training with Reversible Architectures
Stéphane Rivaud
Louis Fournier
Thomas Pumir
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
25
0
0
04 Jun 2024
2BP: 2-Stage Backpropagation
Christopher Rae
Joseph K. L. Lee
James Richings
MoE
MQ
50
0
0
28 May 2024
Pipeline Parallelism with Controllable Memory
Penghui Qi
Xinyi Wan
Nyamdavaa Amar
Min Lin
30
6
0
24 May 2024
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
Muhammad Adnan
Amar Phanishayee
Janardhan Kulkarni
Prashant J. Nair
Divyat Mahajan
45
0
0
23 Apr 2024
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks
Louis Fournier
Edouard Oyallon
52
0
0
13 Mar 2024
Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
Hongsun Jang
Jaeyong Song
Jaewon Jung
Jaeyoung Park
Youngsok Kim
Jinho Lee
29
12
0
11 Mar 2024
Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices
A. Menon
Unnikrishnan Menon
Kailash Ahirwar
29
1
0
03 Jan 2024
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
65
35
0
23 Dec 2023
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Alexander Borzunov
Max Ryabinin
Artem Chumachenko
Dmitry Baranchuk
Tim Dettmers
Younes Belkada
Pavel Samygin
Colin Raffel
MoE
ALM
26
39
0
13 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
31
22
0
01 Dec 2023
Exploring the Robustness of Decentralized Training for Large Language Models
Lin Lu
Chenxi Dai
Wangcheng Tao
Binhang Yuan
Yanan Sun
Pan Zhou
37
1
0
01 Dec 2023
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
45
1
0
01 Dec 2023
Zero Bubble Pipeline Parallelism
Penghui Qi
Xinyi Wan
Guangxing Huang
Min Lin
35
24
0
30 Nov 2023
Practical Performance Guarantees for Pipelined DNN Inference
Aaron Archer
Matthew Fahrbach
Kuikui Liu
Prakash Prabhu
34
0
0
07 Nov 2023
AMSP: Reducing Communication Overhead of ZeRO for Efficient LLM Training
Qiaoling Chen
Qi Hu
Guoteng Wang
Zhisheng Ye
Ting Huang
...
Yang Gao
Hang Yan
Yonggang Wen
Tianwei Zhang
Peng Sun
39
6
0
01 Nov 2023
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Kabir Nagrecha
Arun Kumar
16
6
0
03 Sep 2023
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency
Ziming Liu
Shenggan Cheng
Hao Zhou
Yang You
23
34
0
30 Aug 2023
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Bin Cui
48
9
0
05 Jul 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Bin Cui
41
11
0
17 May 2023
Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches
Siyu Wang
Zongyan Cao
Chang Si
Lansong Diao
Jiamang Wang
W. Lin
37
0
0
03 Mar 2023
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
Siddharth Singh
A. Bhatele
38
9
0
10 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
35
31
0
27 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
34
25
0
24 Jan 2023
Baechi: Fast Device Placement of Machine Learning Graphs
Beomyeol Jeon
L. Cai
Chirag Shetty
P. Srivastava
Jintao Jiang
Xiaolan Ke
Yitao Meng
Cong Xie
Indranil Gupta
GNN
26
18
0
20 Jan 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
31
2
0
17 Jan 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNN
VLM
MoE
34
7
0
06 Jan 2023
PiPar: Pipeline Parallelism for Collaborative Machine Learning
Zihan Zhang
Philip Rodgers
Peter Kilpatrick
I. Spence
Blesson Varghese
FedML
48
3
0
01 Dec 2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
GNN
MoE
45
61
0
25 Nov 2022
Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR
Sami Alabed
Dominik Grewe
Juliana Franco
Bart Chrzaszcz
Tom Natan
Tamara Norman
Norman A. Rink
Dimitrios Vytiniotis
Michael Schaarschmidt
MoE
28
1
0
07 Oct 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
32
20
0
03 Sep 2022
Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning
S. Akintoye
Liangxiu Han
H. Lloyd
Xin Zhang
Darren Dancey
Haoming Chen
Daoqiang Zhang
FedML
34
5
0
22 Jul 2022
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Vitaliy Chiley
Vithursan Thangarasa
Abhay Gupta
Anshul Samar
Joel Hestness
D. DeCoste
52
8
0
28 Jun 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
35
41
0
10 Jun 2022
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Jue Wang
Binhang Yuan
Luka Rimanic
Yongjun He
Tri Dao
Beidi Chen
Christopher Ré
Ce Zhang
AI4CE
31
11
0
02 Jun 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
28
39
0
30 Apr 2022
Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs
John Thorpe
Pengzhan Zhao
Jon Eyolfson
Yifan Qiao
Zhihao Jia
Minjia Zhang
Ravi Netravali
Guoqing Harry Xu
29
56
0
26 Apr 2022
Pathways: Asynchronous Distributed Dataflow for ML
P. Barham
Aakanksha Chowdhery
J. Dean
Sanjay Ghemawat
Steven Hand
...
Parker Schuh
Ryan Sepassi
Laurent El Shafey
C. A. Thekkath
Yonghui Wu
GNN
MoE
45
126
0
23 Mar 2022
PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication
Cheng Wan
Youjie Li
Cameron R. Wolfe
Anastasios Kyrillidis
Namjae Kim
Yingyan Lin
GNN
36
67
0
20 Mar 2022
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan Oseledets
Olivier Beaumont
41
10
0
21 Feb 2022
Egeria: Efficient DNN Training with Knowledge-Guided Layer Freezing
Yiding Wang
D. Sun
Kai Chen
Fan Lai
Mosharaf Chowdhury
33
44
0
17 Jan 2022
Automap: Towards Ergonomic Automated Parallelism for ML Models
Michael Schaarschmidt
Dominik Grewe
Dimitrios Vytiniotis
Adam Paszke
G. Schmid
...
James Molloy
Jonathan Godwin
Norman A. Rink
Vinod Nair
Dan Belov
MoE
25
16
0
06 Dec 2021
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
Sanjith Athlur
Nitika Saran
Muthian Sivathanu
Ramachandran Ramjee
Nipun Kwatra
GNN
39
81
0
07 Nov 2021
Pipeline Parallelism for Inference on Heterogeneous Edge Computing
Yang Hu
Connor Imes
Xuanang Zhao
Souvik Kundu
P. Beerel
S. Crago
J. Walters
MoE
110
19
0
28 Oct 2021
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning
Siddharth Singh
A. Bhatele
GNN
34
14
0
25 Oct 2021
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
38
5
0
16 Oct 2021
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
80
132
0
14 Jul 2021
1
2
Next