ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.08887
  4. Cited By
Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

24 July 2018
Minjie Wang
Chien-chin Huang
Jinyang Li
ArXivPDFHTML

Papers citing "Supporting Very Large Models using Automatic Dataflow Graph Partitioning"

50 / 63 papers shown
Title
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Yinmin Zhong
Zili Zhang
Xiaoniu Song
Hanpeng Hu
Chao Jin
...
Changyi Wan
Hongyu Zhou
Yimin Jiang
Yibo Zhu
Daxin Jiang
OffRL
AI4TS
59
0
0
22 Apr 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLM
MoE
54
0
0
24 Mar 2025
BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating
  Large Models Training
BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training
Houming Wu
Ling Chen
Wenjie Yu
AI4CE
27
0
0
25 Oct 2024
HybridFlow: A Flexible and Efficient RLHF Framework
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
39
88
0
28 Sep 2024
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster
  Scheduling
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Xinyi Zhang
Hanyu Zhao
Wencong Xiao
Xianyan Jia
Fei Xu
Yong Li
Wei Lin
Fangming Liu
25
2
0
16 Aug 2024
Scaling Deep Learning Computation over the Inter-Core Connected
  Intelligence Processor with T10
Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10
Yiqi Liu
Yuqi Xue
Yu Cheng
Lingxiao Ma
Ziming Miao
Jilong Xue
Jian Huang
GNN
23
1
0
09 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
73
8
0
29 Jul 2024
PaSE: Parallelization Strategies for Efficient DNN Training
PaSE: Parallelization Strategies for Efficient DNN Training
Venmugil Elango
24
9
0
04 Jul 2024
On the Performance and Memory Footprint of Distributed Training: An
  Empirical Study on Transformers
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers
Zhengxian Lu
Fangyu Wang
Zhiwei Xu
Fei Yang
Tao Li
37
1
0
02 Jul 2024
On Scaling Up 3D Gaussian Splatting Training
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
37
12
0
26 Jun 2024
GraphPipe: Improving Performance and Scalability of DNN Training with
  Graph Pipeline Parallelism
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Byungsoo Jeon
Mengdi Wu
Shiyi Cao
Sunghyun Kim
Sunghyun Park
...
Xupeng Miao
Mohammad Alizadeh
G. R. Ganger
Tianqi Chen
Zhihao Jia
GNN
AI4CE
66
5
0
24 Jun 2024
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs
  Amid Failures
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures
Swapnil Gandhi
Mark Zhao
Athinagoras Skiadopoulos
Christos Kozyrakis
AI4CE
GNN
49
1
0
22 May 2024
A Codesign of Scheduling and Parallelization for Large Model Training in
  Heterogeneous Clusters
A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters
Chunyu Xue
Weihao Cui
Han Zhao
Quan Chen
Shulai Zhang
Peng Yang
Jing Yang
Shaobo Li
Minyi Guo
45
2
0
24 Mar 2024
Model Parallelism on Distributed Infrastructure: A Literature Review
  from Theory to LLM Case-Studies
Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies
Felix Brakel
Uraz Odyurt
A. Varbanescu
GNN
39
11
0
06 Mar 2024
LR-CNN: Lightweight Row-centric Convolutional Neural Network Training
  for Memory Reduction
LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction
Zhigang Wang
Hangyu Yang
Ning Wang
Chuanfei Xu
Jie Nie
Zhiqiang Wei
Yu Gu
Ge Yu
27
0
0
21 Jan 2024
InternEvo: Efficient Long-sequence Large Language Model Training via
  Hybrid Parallelism and Redundant Sharding
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Peng Sun
57
8
0
17 Jan 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated
  Program Synthesis
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
43
12
0
11 Jan 2024
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
29
22
0
01 Dec 2023
vTrain: A Simulation Framework for Evaluating Cost-effective and
  Compute-optimal Large Language Model Training
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Jehyeon Bang
Yujeong Choi
Myeongwoo Kim
Yongdeok Kim
Minsoo Rhu
35
15
0
27 Nov 2023
Tessel: Boosting Distributed Execution of Large DNN Models via Flexible
  Schedule Search
Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search
Zhiqi Lin
Youshan Miao
Guanbin Xu
Cheng Li
Olli Saarikivi
Saeed Maleki
Fan Yang
17
6
0
26 Nov 2023
HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via
  communication-optimized CPU data offloading)
HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading)
Qiange Wang
Yao Chen
Weng-Fai Wong
Bingsheng He
GNN
26
9
0
25 Nov 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
39
1
0
31 Jul 2023
Improving Automatic Parallel Training via Balanced Memory Workload
  Optimization
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Bin Cui
48
9
0
05 Jul 2023
Proteus: Simulating the Performance of Distributed DNN Training
Proteus: Simulating the Performance of Distributed DNN Training
Jiangfei Duan
Xiuhong Li
Ping Xu
Xingcheng Zhang
Shengen Yan
Yun Liang
Dahua Lin
79
10
0
04 Jun 2023
DISCO: Distributed Inference with Sparse Communications
DISCO: Distributed Inference with Sparse Communications
Minghai Qin
Chaowen Sun
Jaco A. Hofmann
D. Vučinić
FedML
27
1
0
22 Feb 2023
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on
  Production AI Platform
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
Shiwei Zhang
Lansong Diao
Siyu Wang
Zongyan Cao
Yiliang Gu
Chang Si
Ziji Shi
Zhen Zheng
Chuan Wu
W. Lin
AI4CE
29
4
0
16 Feb 2023
Colossal-Auto: Unified Automation of Parallelization and Activation
  Checkpoint for Large-scale Models
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Yuliang Liu
Shenggui Li
Jiarui Fang
Yan Shao
Boyuan Yao
Yang You
OffRL
27
7
0
06 Feb 2023
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic
  Parallelisation
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation
Ziji Shi
Le Jiang
Ang Wang
Jie Zhang
Xianyan Jia
Yong Li
Chencan Wu
Jialin Li
Wei Lin
GNN
44
2
0
01 Feb 2023
SuperScaler: Supporting Flexible DNN Parallelization via a Unified
  Abstraction
SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Zhiqi Lin
Youshan Miao
Guodong Liu
Xiaoxiang Shi
Quanlu Zhang
...
Xu Cao
Cheng-Wu Li
Mao Yang
Lintao Zhang
Lidong Zhou
21
6
0
21 Jan 2023
Baechi: Fast Device Placement of Machine Learning Graphs
Baechi: Fast Device Placement of Machine Learning Graphs
Beomyeol Jeon
L. Cai
Chirag Shetty
P. Srivastava
Jintao Jiang
Xiaolan Ke
Yitao Meng
Cong Xie
Indranil Gupta
GNN
26
18
0
20 Jan 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth
  Cost
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
31
2
0
17 Jan 2023
TAPS: Topology-Aware Intra-Operator Parallelism Strategy Searching
  Algorithm for Deep Neural Networks
TAPS: Topology-Aware Intra-Operator Parallelism Strategy Searching Algorithm for Deep Neural Networks
Peng Liang
Hao Zheng
Teng Su
Linbo Qiao
Dongsheng Li
27
0
0
11 Jan 2023
Galvatron: Efficient Transformer Training over Multiple GPUs Using
  Automatic Parallelism
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
GNN
MoE
42
60
0
25 Nov 2022
On Optimizing the Communication of Model Parallelism
On Optimizing the Communication of Model Parallelism
Yonghao Zhuang
Hexu Zhao
Lianmin Zheng
Zhuohan Li
Eric P. Xing
Qirong Ho
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
27
24
0
10 Nov 2022
Demand Layering for Real-Time DNN Inference with Minimized Memory Usage
Demand Layering for Real-Time DNN Inference with Minimized Memory Usage
Min-Zhi Ji
Saehanseul Yi
Chang-Mo Koo
Sol Ahn
Dongjoo Seo
N. Dutt
Jong-Chan Kim
42
16
0
08 Oct 2022
Memory Safe Computations with XLA Compiler
Memory Safe Computations with XLA Compiler
A. Artemev
Tilman Roeder
Mark van der Wilk
29
8
0
28 Jun 2022
Reducing Activation Recomputation in Large Transformer Models
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
29
257
0
10 May 2022
DistrEdge: Speeding up Convolutional Neural Network Inference on
  Distributed Edge Devices
DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices
Xueyu Hou
Yongjie Guan
Tao Han
Ning Zhang
22
41
0
03 Feb 2022
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed
  Deep Learning
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Lianmin Zheng
Zhuohan Li
Hao Zhang
Yonghao Zhuang
Zhifeng Chen
...
Yuanzhong Xu
Danyang Zhuo
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
MoE
30
104
0
28 Jan 2022
Automap: Towards Ergonomic Automated Parallelism for ML Models
Automap: Towards Ergonomic Automated Parallelism for ML Models
Michael Schaarschmidt
Dominik Grewe
Dimitrios Vytiniotis
Adam Paszke
G. Schmid
...
James Molloy
Jonathan Godwin
Norman A. Rink
Vinod Nair
Dan Belov
MoE
25
16
0
06 Dec 2021
Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders
  up to 100 Trillion Parameters
Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters
Xiangru Lian
Binhang Yuan
Xuefeng Zhu
Yulong Wang
Yongjun He
...
Lei Yuan
Hai-bo Yu
Sen Yang
Ce Zhang
Ji Liu
VLM
33
34
0
10 Nov 2021
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
Jinhui Yuan
Xinqi Li
Cheng Cheng
Juncheng Liu
Ran Guo
...
Fei Yang
Xiaodong Yi
Chuan Wu
Haoran Zhang
Jie Zhao
27
36
0
28 Oct 2021
On the Future of Cloud Engineering
On the Future of Cloud Engineering
David Bermbach
A. Chandra
C. Krintz
A. Gokhale
Aleksander Slominski
L. Thamsen
Everton Cavalcante
Tian Guo
Ivona Brandić
R. Wolski
43
23
0
19 Aug 2021
BAGUA: Scaling up Distributed Learning with System Relaxations
BAGUA: Scaling up Distributed Learning with System Relaxations
Shaoduo Gan
Xiangru Lian
Rui Wang
Jianbin Chang
Chengjun Liu
...
Jiawei Jiang
Binhang Yuan
Sen Yang
Ji Liu
Ce Zhang
25
30
0
03 Jul 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
40
815
0
14 Jun 2021
GSPMD: General and Scalable Parallelization for ML Computation Graphs
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Blake A. Hechtman
Yanping Huang
...
Noam M. Shazeer
Shibo Wang
Tao Wang
Yonghui Wu
Zhifeng Chen
MoE
28
128
0
10 May 2021
ActNN: Reducing Training Memory Footprint via 2-Bit Activation
  Compressed Training
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Jianfei Chen
Lianmin Zheng
Z. Yao
Dequan Wang
Ion Stoica
Michael W. Mahoney
Joseph E. Gonzalez
MQ
27
74
0
29 Apr 2021
PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language
  Models with Auto-parallel Computation
PanGu-ααα: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
...
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
ALM
MoE
AI4CE
35
212
0
26 Apr 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
  Learning
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
41
370
0
16 Apr 2021
Automatic Graph Partitioning for Very Large-scale Deep Learning
Automatic Graph Partitioning for Very Large-scale Deep Learning
Masahiro Tanaka
Kenjiro Taura
T. Hanawa
Kentaro Torisawa
GNN
AI4CE
31
19
0
30 Mar 2021
12
Next