ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.05358
  4. Cited By
Beyond Data and Model Parallelism for Deep Neural Networks

Beyond Data and Model Parallelism for Deep Neural Networks

14 July 2018
Zhihao Jia
Matei A. Zaharia
A. Aiken
    GNN
    AI4CE
ArXivPDFHTML

Papers citing "Beyond Data and Model Parallelism for Deep Neural Networks"

50 / 87 papers shown
Title
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Zihan Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
151
0
0
02 May 2025
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
Igor Polyakov
Alexey Dukhanov
Egor Spirin
46
0
0
08 Apr 2025
AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding
AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding
Zikun Li
Zhuofu Chen
Remi Delacourt
Gabriele Oliaro
Zeyu Wang
...
Zhuoming Chen
Sean Lai
Xinhao Cheng
Xupeng Miao
Zhihao Jia
53
6
0
21 Jan 2025
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
251
3
0
20 Nov 2024
Acceleration for Deep Reinforcement Learning using Parallel and
  Distributed Computing: A Survey
Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
Zhihong Liu
Xin Xu
Peng Qiao
Dongsheng Li
OffRL
27
2
0
08 Nov 2024
FRED: Flexible REduction-Distribution Interconnect and Communication
  Implementation for Wafer-Scale Distributed Training of DNN Models
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Puneet Gupta
Tushar Krishna
30
0
0
28 Jun 2024
A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings
A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings
Madison Threadgill
A. Gerstlauer
49
1
0
23 May 2024
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning
  Training
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
Muhammad Adnan
Amar Phanishayee
Janardhan Kulkarni
Prashant J. Nair
Divyat Mahajan
45
0
0
23 Apr 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using
  Heterogeneous Pipelines
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Jiaao He
Jidong Zhai
45
27
0
18 Mar 2024
Partitioned Neural Network Training via Synthetic Intermediate Labels
Partitioned Neural Network Training via Synthetic Intermediate Labels
C. V. Karadag
Nezih Topaloglu
34
1
0
17 Mar 2024
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural
  Networks
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks
Louis Fournier
Edouard Oyallon
52
0
0
13 Mar 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated
  Program Synthesis
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
43
12
0
11 Jan 2024
Federated Learning is Better with Non-Homomorphic Encryption
Federated Learning is Better with Non-Homomorphic Encryption
Konstantin Burlachenko
Abdulmajeed Alrowithi
Fahad Ali Albalawi
Peter Richtárik
FedML
47
6
0
04 Dec 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
39
2
0
31 Jul 2023
A Survey From Distributed Machine Learning to Distributed Deep Learning
A Survey From Distributed Machine Learning to Distributed Deep Learning
Mohammad Dehghani
Zahra Yazdanparast
26
0
0
11 Jul 2023
Automated Tensor Model Parallelism with Overlapped Communication for
  Efficient Foundation Model Training
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training
Shengwei Li
Zhiquan Lai
Yanqi Hao
Weijie Liu
Ke-shi Ge
Xiaoge Deng
Dongsheng Li
KaiCheng Lu
19
10
0
25 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with
  Tree-based Speculative Inference and Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
65
120
0
16 May 2023
Energy-Efficient GPU Clusters Scheduling for Deep Learning
Energy-Efficient GPU Clusters Scheduling for Deep Learning
Diandian Gu
Xintong Xie
Gang Huang
Xin Jin
Xuanzhe Liu
GNN
24
7
0
13 Apr 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on
  Production AI Platform
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
Shiwei Zhang
Lansong Diao
Siyu Wang
Zongyan Cao
Yiliang Gu
Chang Si
Ziji Shi
Zhen Zheng
Chuan Wu
W. Lin
AI4CE
32
4
0
16 Feb 2023
Expediting Distributed DNN Training with Device Topology-Aware Graph
  Deployment
Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment
Shiwei Zhang
Xiaodong Yi
Lansong Diao
Chuan Wu
Siyu Wang
W. Lin
GNN
22
5
0
13 Feb 2023
Colossal-Auto: Unified Automation of Parallelization and Activation
  Checkpoint for Large-scale Models
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Yuliang Liu
Shenggui Li
Jiarui Fang
Yan Shao
Boyuan Yao
Yang You
OffRL
27
7
0
06 Feb 2023
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic
  Parallelisation
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation
Ziji Shi
Le Jiang
Ang Wang
Jie Zhang
Xianyan Jia
Yong Li
Chencan Wu
Jialin Li
Wei Lin
GNN
49
2
0
01 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNN
VLM
MoE
28
7
0
06 Jan 2023
Does compressing activations help model parallel training?
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
21
5
0
06 Jan 2023
Pex: Memory-efficient Microcontroller Deep Learning through Partial
  Execution
Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution
Edgar Liberis
Nicholas D. Lane
23
3
0
30 Nov 2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using
  Automatic Parallelism
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
GNN
MoE
45
60
0
25 Nov 2022
Distributed Graph Neural Network Training: A Survey
Distributed Graph Neural Network Training: A Survey
Yingxia Shao
Hongzheng Li
Xizhi Gu
Hongbo Yin
Yawen Li
Xupeng Miao
Wentao Zhang
Bin Cui
Lei Chen
GNN
AI4CE
11
56
0
01 Nov 2022
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity
  Awareness
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
Dacheng Li
Hongyi Wang
Eric P. Xing
Haotong Zhang
MoE
22
20
0
13 Oct 2022
Dataloader Parameter Tuner: An Automated Dataloader Parameter Tuner for
  Deep Learning Models
Dataloader Parameter Tuner: An Automated Dataloader Parameter Tuner for Deep Learning Models
Jooyoung Park
DoangJoo Synn
XinYu Piao
Jong-Kook Kim
16
0
0
11 Oct 2022
Demystifying Map Space Exploration for NPUs
Demystifying Map Space Exploration for NPUs
Sheng-Chun Kao
A. Parashar
Po-An Tsai
T. Krishna
38
11
0
07 Oct 2022
DreamShard: Generalizable Embedding Table Placement for Recommender
  Systems
DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Daochen Zha
Louis Feng
Qiaoyu Tan
Zirui Liu
Kwei-Herng Lai
Bhargav Bhushanam
Yuandong Tian
A. Kejariwal
Xia Hu
LMTD
OffRL
33
28
0
05 Oct 2022
Optimizing DNN Compilation for Distributed Training with Joint OP and
  Tensor Fusion
Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion
Xiaodong Yi
Shiwei Zhang
Lansong Diao
Chuan Wu
Zhen Zheng
Shiqing Fan
Siyu Wang
Jun Yang
W. Lin
39
4
0
26 Sep 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
32
20
0
03 Sep 2022
A simplified convergence theory for Byzantine resilient stochastic
  gradient descent
A simplified convergence theory for Byzantine resilient stochastic gradient descent
Lindon Roberts
E. Smyth
31
3
0
25 Aug 2022
Layer-Wise Partitioning and Merging for Efficient and Scalable Deep
  Learning
Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning
S. Akintoye
Liangxiu Han
H. Lloyd
Xin Zhang
Darren Dancey
Haoming Chen
Daoqiang Zhang
FedML
34
5
0
22 Jul 2022
Reducing Activation Recomputation in Large Transformer Models
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
32
257
0
10 May 2022
Neural Architecture Search using Property Guided Synthesis
Neural Architecture Search using Property Guided Synthesis
Charles Jin
P. Phothilimthana
Sudip Roy
27
6
0
08 May 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
19
39
0
30 Apr 2022
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient
  Training of Deep Learning Models
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models
Yunzhuo Liu
Bo Jiang
Tian Guo
Zimeng Huang
Wen-ping Ma
Xinbing Wang
Chenghu Zhou
24
9
0
28 Apr 2022
Efficient Neural Network Analysis with Sum-of-Infeasibilities
Efficient Neural Network Analysis with Sum-of-Infeasibilities
Haoze Wu
Aleksandar Zeljić
Guy Katz
Clark W. Barrett
AAML
47
30
0
19 Mar 2022
Hercules: Heterogeneity-Aware Inference Serving for At-Scale
  Personalized Recommendation
Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation
Liu Ke
Udit Gupta
Mark Hempstead
Carole-Jean Wu
Hsien-Hsin S. Lee
Xuan Zhang
26
21
0
14 Mar 2022
BagPipe: Accelerating Deep Recommendation Model Training
BagPipe: Accelerating Deep Recommendation Model Training
Saurabh Agarwal
Chengpo Yan
Ziyi Zhang
Shivaram Venkataraman
37
17
0
24 Feb 2022
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
  Preprocessing Pipelines
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
Alexander Isenko
R. Mayer
Jeffrey Jedele
Hans-Arno Jacobsen
19
23
0
17 Feb 2022
Efficient Direct-Connect Topologies for Collective Communications
Efficient Direct-Connect Topologies for Collective Communications
Liangyu Zhao
Siddharth Pal
Tapan Chugh
Weiyang Wang
Jason Fantl
P. Basu
J. Khoury
Arvind Krishnamurthy
42
6
0
07 Feb 2022
DistrEdge: Speeding up Convolutional Neural Network Inference on
  Distributed Edge Devices
DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices
Xueyu Hou
Yongjie Guan
Tao Han
Ning Zhang
22
41
0
03 Feb 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for
  Distributed Training Jobs
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
39
81
0
01 Feb 2022
Hydra: A System for Large Multi-Model Deep Learning
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
38
5
0
16 Oct 2021
12
Next