ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.09941
  4. Cited By
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

26 February 2018
Tal Ben-Nun
Torsten Hoefler
    GNN
ArXivPDFHTML

Papers citing "Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis"

45 / 95 papers shown
Title
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
Cameron R. Wolfe
Jingkang Yang
Arindam Chowdhury
Chen Dun
Artun Bayer
Santiago Segarra
Anastasios Kyrillidis
BDL
GNN
LRM
49
9
0
20 Feb 2021
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and
  Stable Convergence
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence
Karl Bäckström
Ivan Walulya
Marina Papatriantafilou
P. Tsigas
23
5
0
17 Feb 2021
Clairvoyant Prefetching for Distributed Machine Learning I/O
Clairvoyant Prefetching for Distributed Machine Learning I/O
Nikoli Dryden
Roman Böhringer
Tal Ben-Nun
Torsten Hoefler
31
55
0
21 Jan 2021
PFL-MoE: Personalized Federated Learning Based on Mixture of Experts
PFL-MoE: Personalized Federated Learning Based on Mixture of Experts
Binbin Guo
Yuan Mei
Danyang Xiao
Weigang Wu
Ye Yin
Hongli Chang
MoE
47
22
0
31 Dec 2020
Parallel Blockwise Knowledge Distillation for Deep Neural Network
  Compression
Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression
Cody Blakeney
Xiaomin Li
Yan Yan
Ziliang Zong
46
39
0
05 Dec 2020
Integrating Deep Learning in Domain Sciences at Exascale
Integrating Deep Learning in Domain Sciences at Exascale
Rick Archibald
E. Chow
E. DÁzevedo
Jack J. Dongarra
M. Eisenbach
...
Florent Lopez
Daniel Nichols
S. Tomov
Kwai Wong
Junqi Yin
PINN
23
5
0
23 Nov 2020
Distributed Deep Reinforcement Learning: An Overview
Distributed Deep Reinforcement Learning: An Overview
Mohammad Reza Samsami
Hossein Alimadad
OffRL
14
27
0
22 Nov 2020
A Novel Memory-Efficient Deep Learning Training Framework via
  Error-Bounded Lossy Compression
A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression
Sian Jin
Guanpeng Li
Shuaiwen Leon Song
Dingwen Tao
AI4CE
29
12
0
18 Nov 2020
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
Michael Lui
Yavuz Yetim
Özgür Özkan
Zhuoran Zhao
Shin-Yeh Tsai
Carole-Jean Wu
Mark Hempstead
GNN
BDL
LRM
22
51
0
04 Nov 2020
Towards a Scalable and Distributed Infrastructure for Deep Learning
  Applications
Towards a Scalable and Distributed Infrastructure for Deep Learning Applications
Bita Hasheminezhad
S. Shirzad
Nanmiao Wu
Patrick Diehl
Hannes Schulz
Hartmut Kaiser
GNN
AI4CE
27
4
0
06 Oct 2020
HeteroFL: Computation and Communication Efficient Federated Learning for
  Heterogeneous Clients
HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients
Enmao Diao
Jie Ding
Vahid Tarokh
FedML
26
543
0
03 Oct 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
34
79
0
17 Sep 2020
Federated Transfer Learning with Dynamic Gradient Aggregation
Federated Transfer Learning with Dynamic Gradient Aggregation
Dimitrios Dimitriadis
K. Kumatani
R. Gmyr
Yashesh Gaur
Sefik Emre Eskimez
FedML
16
15
0
06 Aug 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs
  with Hybrid Parallelism
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Yosuke Oyama
N. Maruyama
Nikoli Dryden
Erin McCarthy
P. Harrington
J. Balewski
Satoshi Matsuoka
Peter Nugent
B. Van Essen
3DV
AI4CE
32
37
0
25 Jul 2020
Layer-Parallel Training with GPU Concurrency of Deep Residual Neural
  Networks via Nonlinear Multigrid
Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid
Andrew Kirby
S. Samsi
Michael Jones
Albert Reuther
J. Kepner
V. Gadepally
17
12
0
14 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
GPU-Accelerated Discontinuous Galerkin Methods: 30x Speedup on 345
  Billion Unknowns
GPU-Accelerated Discontinuous Galerkin Methods: 30x Speedup on 345 Billion Unknowns
Andrew C. Kirby
D. Mavriplis
9
7
0
28 Jun 2020
Autonomous Driving with Deep Learning: A Survey of State-of-Art
  Technologies
Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies
Yu Huang
Yue Chen
3DPC
49
83
0
10 Jun 2020
Reducing Communication in Graph Neural Network Training
Reducing Communication in Graph Neural Network Training
Alok Tripathy
Katherine Yelick
A. Buluç
GNN
24
104
0
07 May 2020
A Review of Privacy-preserving Federated Learning for the
  Internet-of-Things
A Review of Privacy-preserving Federated Learning for the Internet-of-Things
Christopher Briggs
Zhong Fan
Péter András
23
14
0
24 Apr 2020
Characterizing and Modeling Distributed Training with Transient Cloud
  GPU Servers
Characterizing and Modeling Distributed Training with Transient Cloud GPU Servers
Shijian Li
R. Walls
Tian Guo
23
23
0
07 Apr 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
30
12
0
06 Mar 2020
Machine Learning in Python: Main developments and technology trends in
  data science, machine learning, and artificial intelligence
Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence
S. Raschka
Joshua Patterson
Corey J. Nolet
AI4CE
24
483
0
12 Feb 2020
Better Theory for SGD in the Nonconvex World
Better Theory for SGD in the Nonconvex World
Ahmed Khaled
Peter Richtárik
13
178
0
09 Feb 2020
Machine Unlearning
Machine Unlearning
Lucas Bourtoule
Varun Chandrasekaran
Christopher A. Choquette-Choo
Hengrui Jia
Adelin Travers
Baiwu Zhang
David Lie
Nicolas Papernot
MU
27
807
0
09 Dec 2019
A Graph Autoencoder Approach to Causal Structure Learning
A Graph Autoencoder Approach to Causal Structure Learning
Ignavier Ng
Shengyu Zhu
Zhitang Chen
Zhuangyan Fang
BDL
CML
22
81
0
18 Nov 2019
Node-Aware Improvements to Allreduce
Node-Aware Improvements to Allreduce
Amanda Bienz
Luke N. Olson
W. Gropp
9
14
0
21 Oct 2019
Exascale Deep Learning to Accelerate Cancer Research
Exascale Deep Learning to Accelerate Cancer Research
Robert M. Patton
J. T. Johnston
Steven R. Young
Catherine D. Schuman
T. Potok
...
Junghoon Chae
L. Hou
Shahira Abousamra
Dimitris Samaras
Joel H. Saltz
18
15
0
26 Sep 2019
Gradient Descent with Compressed Iterates
Gradient Descent with Compressed Iterates
Ahmed Khaled
Peter Richtárik
21
22
0
10 Sep 2019
Techniques for Automated Machine Learning
Techniques for Automated Machine Learning
Yi-Wei Chen
Qingquan Song
Xia Hu
10
48
0
21 Jul 2019
A Survey of Phase Classification Techniques for Characterizing Variable
  Application Behavior
A Survey of Phase Classification Techniques for Characterizing Variable Application Behavior
Keeley Criswell
Tosiron Adegbija
6
4
0
16 Jul 2019
Decentralized Learning of Generative Adversarial Networks from Non-iid
  Data
Decentralized Learning of Generative Adversarial Networks from Non-iid Data
Ryo Yonetani
Tomohiro Takahashi
Atsushi Hashimoto
Yoshitaka Ushiku
37
24
0
23 May 2019
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained
  Parallelism
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
Nikoli Dryden
N. Maruyama
Tom Benson
Tim Moon
M. Snir
B. Van Essen
20
49
0
15 Mar 2019
Speeding up Deep Learning with Transient Servers
Speeding up Deep Learning with Transient Servers
Shijian Li
R. Walls
Lijie Xu
Tian Guo
22
12
0
28 Feb 2019
Graph Processing on FPGAs: Taxonomy, Survey, Challenges
Graph Processing on FPGAs: Taxonomy, Survey, Challenges
Maciej Besta
Dimitri Stanojevic
Johannes de Fine Licht
Tal Ben-Nun
Torsten Hoefler
GNN
AI4CE
27
52
0
25 Feb 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
25
72
0
27 Jan 2019
No Peek: A Survey of private distributed deep learning
No Peek: A Survey of private distributed deep learning
Praneeth Vepakomma
Tristan Swedish
Ramesh Raskar
O. Gupta
Abhimanyu Dubey
SyDa
FedML
24
99
0
08 Dec 2018
Wireless Network Intelligence at the Edge
Wireless Network Intelligence at the Edge
Jihong Park
S. Samarakoon
M. Bennis
Mérouane Debbah
21
518
0
07 Dec 2018
Split learning for health: Distributed deep learning without sharing raw
  patient data
Split learning for health: Distributed deep learning without sharing raw patient data
Praneeth Vepakomma
O. Gupta
Tristan Swedish
Ramesh Raskar
FedML
39
692
0
03 Dec 2018
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI:
  Characterization, Designs, and Performance Evaluation
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
A. A. Awan
Jeroen Bédorf
Ching-Hsiang Chu
Hari Subramoni
D. Panda
GNN
25
45
0
25 Oct 2018
Characterizing Deep-Learning I/O Workloads in TensorFlow
Characterizing Deep-Learning I/O Workloads in TensorFlow
Steven W. D. Chien
Stefano Markidis
C. Sishtla
Luís Santos
Pawel Herman
Sai B. Narasimhamurthy
Erwin Laure
13
50
0
06 Oct 2018
SparCML: High-Performance Sparse Communication for Machine Learning
SparCML: High-Performance Sparse Communication for Machine Learning
Cédric Renggli
Saleh Ashkboos
Mehdi Aghagolzadeh
Dan Alistarh
Torsten Hoefler
29
126
0
22 Feb 2018
On the Origin of Deep Learning
On the Origin of Deep Learning
Haohan Wang
Bhiksha Raj
MedIm
3DV
VLM
48
223
0
24 Feb 2017
Neural Architecture Search with Reinforcement Learning
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
271
5,329
0
05 Nov 2016
Optimal Distributed Online Prediction using Mini-Batches
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
177
683
0
07 Dec 2010
Previous
12