Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.09941
Cited By
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
26 February 2018
Tal Ben-Nun
Torsten Hoefler
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis"
50 / 77 papers shown
Title
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training
Yijie Zheng
Bangjun Xiao
Lei Shi
Xiaoyang Li
Faming Wu
Tianyu Li
Xuefeng Xiao
Y. Zhang
Y. Wang
Shouda Liu
MLLM
MoE
67
1
0
31 Mar 2025
Seesaw: High-throughput LLM Inference via Model Re-sharding
Qidong Su
Wei Zhao
X. Li
Muralidhar Andoorveedu
Chenhao Jiang
Zhanda Zhu
Kevin Song
Christina Giannoula
Gennady Pekhimenko
LRM
72
0
0
09 Mar 2025
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther
Xiaozhe Yao
Tolga Kerimoglu
Ana Klimovic
Viktor Gsteiger
Ana Klimovic
MoE
99
0
0
27 Feb 2025
Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
Zhihong Liu
Xin Xu
Peng Qiao
Dongsheng Li
OffRL
20
2
0
08 Nov 2024
Going Forward-Forward in Distributed Deep Learning
Ege Aktemur
Ege Zorlutuna
Kaan Bilgili
Tacettin Emre Bok
Berrin Yanikoglu
Suha Orhun Mutluergil
FedML
18
1
0
30 Mar 2024
Machine learning and domain decomposition methods -- a survey
A. Klawonn
M. Lanser
J. Weber
AI4CE
16
7
0
21 Dec 2023
Graft: Efficient Inference Serving for Hybrid Deep Learning with SLO Guarantees via DNN Re-alignment
Jing Wu
Lin Wang
Qirui Jin
Fangming Liu
23
11
0
17 Dec 2023
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
20
1
0
01 Dec 2023
Convergence Analysis of Decentralized ASGD
Mauro Dalle Lucca Tosi
Martin Theobald
23
2
0
07 Sep 2023
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs
Mikhail Khalilov
Marcin Chrapek
Siyuan Shen
Alessandro Vezzu
Thomas Emanuel Benz
Salvatore Di Girolamo
Timo Schneider
Daniele Di Sensi
Luca Benini
Torsten Hoefler
30
6
0
07 Sep 2023
A Survey From Distributed Machine Learning to Distributed Deep Learning
Mohammad Dehghani
Zahra Yazdanparast
15
0
0
11 Jul 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
25
2
0
18 Jun 2023
The Evolution of Distributed Systems for Graph Neural Networks and their Origin in Graph Processing and Deep Learning: A Survey
Jana Vatter
R. Mayer
Hans-Arno Jacobsen
GNN
AI4TS
AI4CE
37
23
0
23 May 2023
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching
S. Tyagi
Prateek Sharma
16
22
0
20 May 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
22
31
0
27 Jan 2023
A Theory of I/O-Efficient Sparse Neural Network Inference
Niels Gleinig
Tal Ben-Nun
Torsten Hoefler
19
0
0
03 Jan 2023
Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox
Qiyue Yin
Tongtong Yu
S. Shen
Jun Yang
Meijing Zhao
Kaiqi Huang
Bin Liang
Liangsheng Wang
OffRL
20
13
0
01 Dec 2022
LOFT: Finding Lottery Tickets through Filter-wise Training
Qihan Wang
Chen Dun
Fangshuo Liao
C. Jermaine
Anastasios Kyrillidis
18
3
0
28 Oct 2022
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability
Daniele De Sensi
T. De Matteis
Konstantin Taranov
Salvatore Di Girolamo
Tobias Rahn
Torsten Hoefler
21
12
0
27 Oct 2022
Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference
Zehuan Wang
Yingcan Wei
Minseok Lee
Matthias Langer
F. Yu
...
Daniel G. Abel
Xu Guo
Jianbing Dong
Ji Shi
Kunlun Li
GNN
LRM
19
32
0
17 Oct 2022
Downlink Compression Improves TopK Sparsification
William Zou
H. Sterck
Jun Liu
14
0
0
30 Sep 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
21
20
0
03 Sep 2022
FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning
Nang Hung Nguyen
Phi Le Nguyen
D. Nguyen
Trung Thanh Nguyen
Thuy-Dung Nguyen
H. Pham
Truong Thao Nguyen
FedML
59
24
0
04 Aug 2022
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis
Maciej Besta
Torsten Hoefler
GNN
34
56
0
19 May 2022
The spatial computer: A model for energy-efficient parallel computation
Lukas Gianinazzi
Tal Ben-Nun
Maciej Besta
Saleh Ashkboos
Yves Baumann
Piotr Luczynski
Torsten Hoefler
16
5
0
10 May 2022
Byzantine Fault Tolerance in Distributed Machine Learning : a Survey
Djamila Bouhata
Hamouma Moumen
Moumen Hamouma
Ahcène Bounceur
AI4CE
25
7
0
05 May 2022
Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences
G. Moon
E. Cyr
22
5
0
07 Mar 2022
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning
Jie Zhu
Shenggui Li
Yang You
FedML
14
5
0
24 Feb 2022
Shisha: Online scheduling of CNN pipelines on heterogeneous architectures
Pirah Noor Soomro
M. Abduljabbar
J. Castrillón
Miquel Pericàs
12
1
0
23 Feb 2022
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Sian Jin
Chengming Zhang
Xintong Jiang
Yunhe Feng
Hui Guan
Guanpeng Li
S. Song
Dingwen Tao
23
23
0
18 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
22
14
0
01 Nov 2021
Exponential Graph is Provably Efficient for Decentralized Deep Training
Bicheng Ying
Kun Yuan
Yiming Chen
Hanbin Hu
Pan Pan
W. Yin
FedML
34
83
0
26 Oct 2021
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
30
5
0
16 Oct 2021
Dynamic Neural Network Architectural and Topological Adaptation and Related Methods -- A Survey
Lorenz Kummer
AI4CE
32
0
0
28 Jul 2021
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
77
131
0
14 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
16
21
0
02 Jul 2021
Flare: Flexible In-Network Allreduce
Daniele De Sensi
Salvatore Di Girolamo
Saleh Ashkboos
Shigang Li
Torsten Hoefler
22
40
0
29 Jun 2021
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Zhengchun Liu
R. Kettimuthu
M. Papka
Ian T. Foster
29
3
0
22 Jun 2021
Dynamic Gradient Aggregation for Federated Domain Adaptation
Dimitrios Dimitriadis
K. Kumatani
R. Gmyr
Yashesh Gaur
Sefik Emre Eskimez
FedML
15
5
0
14 Jun 2021
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
37
813
0
14 Jun 2021
Vector Symbolic Architectures as a Computing Framework for Emerging Hardware
Denis Kleyko
Mike Davies
E. P. Frady
P. Kanerva
Spencer J. Kent
...
Evgeny Osipov
J. Rabaey
D. Rachkovskij
Abbas Rahimi
Friedrich T. Sommer
32
56
0
09 Jun 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
30
366
0
16 Apr 2021
DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation
Boxin Wang
Fan Wu
Yunhui Long
Luka Rimanic
Ce Zhang
Bo-wen Li
FedML
29
63
0
20 Mar 2021
Parareal Neural Networks Emulating a Parallel-in-time Algorithm
Zhanyu Ma
Jiyang Xie
Jingyi Yu
AI4CE
11
9
0
16 Mar 2021
EventGraD: Event-Triggered Communication in Parallel Machine Learning
Soumyadip Ghosh
B. Aquino
V. Gupta
FedML
19
8
0
12 Mar 2021
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
Cameron R. Wolfe
Jingkang Yang
Arindam Chowdhury
Chen Dun
Artun Bayer
Santiago Segarra
Anastasios Kyrillidis
BDL
GNN
LRM
43
9
0
20 Feb 2021
PFL-MoE: Personalized Federated Learning Based on Mixture of Experts
Binbin Guo
Yuan Mei
Danyang Xiao
Weigang Wu
Ye Yin
Hongli Chang
MoE
38
22
0
31 Dec 2020
Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression
Cody Blakeney
Xiaomin Li
Yan Yan
Ziliang Zong
32
39
0
05 Dec 2020
Integrating Deep Learning in Domain Sciences at Exascale
Rick Archibald
E. Chow
E. DÁzevedo
Jack J. Dongarra
M. Eisenbach
...
Florent Lopez
Daniel Nichols
S. Tomov
Kwai Wong
Junqi Yin
PINN
15
5
0
23 Nov 2020
Distributed Deep Reinforcement Learning: An Overview
Mohammad Reza Samsami
Hossein Alimadad
OffRL
6
27
0
22 Nov 2020
1
2
Next