ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.05799
  4. Cited By
Horovod: fast and easy distributed deep learning in TensorFlow

Horovod: fast and easy distributed deep learning in TensorFlow

15 February 2018
Alexander Sergeev
Mike Del Balso
ArXivPDFHTML

Papers citing "Horovod: fast and easy distributed deep learning in TensorFlow"

50 / 174 papers shown
Title
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Rui Ma
E. Georganas
A. Heinecke
Andrew Boutros
Eriko Nurvitadhi
GNN
27
12
0
22 Apr 2022
Efficient Pipeline Planning for Expedited Distributed DNN Training
Efficient Pipeline Planning for Expedited Distributed DNN Training
Ziyue Luo
Xiaodong Yi
Guoping Long
Shiqing Fan
Chuan Wu
Jun Yang
Wei Lin
36
16
0
22 Apr 2022
PICASSO: Unleashing the Potential of GPU-centric Training for
  Wide-and-deep Recommender Systems
PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems
Yuanxing Zhang
Langshi Chen
Siran Yang
Man Yuan
Hui-juan Yi
...
Yong Li
Dingyang Zhang
Wei Lin
Lin Qu
Bo Zheng
43
32
0
11 Apr 2022
PerfectDou: Dominating DouDizhu with Perfect Information Distillation
PerfectDou: Dominating DouDizhu with Perfect Information Distillation
Yang Guan
Minghuan Liu
Weijun Hong
Weinan Zhang
Fei Fang
Guangjun Zeng
Yue Lin
33
26
0
30 Mar 2022
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
  Preprocessing Pipelines
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
Alexander Isenko
R. Mayer
Jeffrey Jedele
Hans-Arno Jacobsen
19
23
0
17 Feb 2022
Benchmark Assessment for DeepSpeed Optimization Library
Benchmark Assessment for DeepSpeed Optimization Library
G. Liang
I. Alsmadi
34
3
0
12 Feb 2022
Efficient Direct-Connect Topologies for Collective Communications
Efficient Direct-Connect Topologies for Collective Communications
Liangyu Zhao
Siddharth Pal
Tapan Chugh
Weiyang Wang
Jason Fantl
P. Basu
J. Khoury
Arvind Krishnamurthy
44
7
0
07 Feb 2022
Distributed Learning With Sparsified Gradient Differences
Distributed Learning With Sparsified Gradient Differences
Yicheng Chen
Rick S. Blum
Martin Takáč
Brian M. Sadler
42
15
0
05 Feb 2022
GADGET: Online Resource Optimization for Scheduling Ring-All-Reduce
  Learning Jobs
GADGET: Online Resource Optimization for Scheduling Ring-All-Reduce Learning Jobs
Menglu Yu
Ye Tian
Bo Ji
Chuan Wu
Hridesh Rajan
Jia-Wei Liu
16
17
0
02 Feb 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for
  Distributed Training Jobs
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
39
85
0
01 Feb 2022
You May Not Need Ratio Clipping in PPO
You May Not Need Ratio Clipping in PPO
Mingfei Sun
Vitaly Kurin
Guoqing Liu
Sam Devlin
Tao Qin
Katja Hofmann
Shimon Whiteson
18
15
0
31 Jan 2022
Benchmarking Resource Usage for Efficient Distributed Deep Learning
Benchmarking Resource Usage for Efficient Distributed Deep Learning
Nathan C. Frey
Baolin Li
Joseph McDonald
Dan Zhao
Michael Jones
David Bestor
Devesh Tiwari
V. Gadepally
S. Samsi
37
9
0
28 Jan 2022
Scientific Machine Learning through Physics-Informed Neural Networks:
  Where we are and What's next
Scientific Machine Learning through Physics-Informed Neural Networks: Where we are and What's next
S. Cuomo
Vincenzo Schiano Di Cola
F. Giampaolo
G. Rozza
Maizar Raissi
F. Piccialli
PINN
31
1,190
0
14 Jan 2022
FRuDA: Framework for Distributed Adversarial Domain Adaptation
FRuDA: Framework for Distributed Adversarial Domain Adaptation
Shaoduo Gan
Akhil Mathur
Anton Isopoussu
F. Kawsar
N. Bianchi-Berthouze
Nicholas D. Lane
19
12
0
26 Dec 2021
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training
  for Language Understanding and Generation
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
39
79
0
23 Dec 2021
HET: Scaling out Huge Embedding Model Training via Cache-enabled
  Distributed Framework
HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Xupeng Miao
Hailin Zhang
Yining Shi
Xiaonan Nie
Zhi-Xin Yang
Yangyu Tao
Bin Cui
24
57
0
14 Dec 2021
COMET: A Novel Memory-Efficient Deep Learning Training Framework by
  Using Error-Bounded Lossy Compression
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Sian Jin
Chengming Zhang
Xintong Jiang
Yunhe Feng
Hui Guan
Guanpeng Li
Shuaiwen Leon Song
Dingwen Tao
30
23
0
18 Nov 2021
Benchmarking and scaling of deep learning models for land cover image
  classification
Benchmarking and scaling of deep learning models for land cover image classification
Ioannis Papoutsis
N. Bountos
Angelos Zavras
Dimitrios Michail
Christos Tryfonopoulos
29
55
0
18 Nov 2021
BlueFog: Make Decentralized Algorithms Practical for Optimization and
  Deep Learning
BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning
Bicheng Ying
Kun Yuan
Hanbin Hu
Yiming Chen
W. Yin
FedML
44
27
0
08 Nov 2021
A System for General In-Hand Object Re-Orientation
A System for General In-Hand Object Re-Orientation
Tao Chen
Jie Xu
Pulkit Agrawal
45
251
0
04 Nov 2021
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
  Training
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
30
143
0
28 Oct 2021
AxoNN: An asynchronous, message-driven parallel framework for
  extreme-scale deep learning
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning
Siddharth Singh
A. Bhatele
GNN
34
14
0
25 Oct 2021
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
  Multi-GPU Servers
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers
Yujing Ma
Florin Rusu
Kesheng Wu
A. Sim
48
3
0
13 Oct 2021
Relative Molecule Self-Attention Transformer
Relative Molecule Self-Attention Transformer
Lukasz Maziarka
Dawid Majchrowski
Tomasz Danel
Piotr Gaiñski
Jacek Tabor
Igor T. Podolak
Pawel M. Morkisz
Stanislaw Jastrzebski
MedIm
45
34
0
12 Oct 2021
Solon: Communication-efficient Byzantine-resilient Distributed Training
  via Redundant Gradients
Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients
Lingjiao Chen
Leshang Chen
Hongyi Wang
S. Davidson
Yan Sun
FedML
37
1
0
04 Oct 2021
TSM: Temporal Shift Module for Efficient and Scalable Video
  Understanding on Edge Device
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device
Ji Lin
Chuang Gan
Kuan-Chieh Wang
Song Han
45
64
0
27 Sep 2021
Neural Architecture Search in operational context: a remote sensing
  case-study
Neural Architecture Search in operational context: a remote sensing case-study
Anthony Cazasnoves
Pierre-Antoine Ganaye
Kévin Sanchis
Tugdual Ceillier
27
0
0
15 Sep 2021
Multilingual Translation via Grafting Pre-trained Language Models
Multilingual Translation via Grafting Pre-trained Language Models
Zewei Sun
Mingxuan Wang
Lei Li
AI4CE
191
22
0
11 Sep 2021
HPTMT Parallel Operators for High Performance Data Science & Data
  Engineering
HPTMT Parallel Operators for High Performance Data Science & Data Engineering
V. Abeykoon
Supun Kamburugamuve
Chathura Widanage
Niranda Perera
A. Uyar
Thejaka Amila Kanewala
G. V. Laszewski
Geoffrey C. Fox
AI4TS
16
1
0
13 Aug 2021
You Do Not Need a Bigger Boat: Recommendations at Reasonable Scale in a
  (Mostly) Serverless and Open Stack
You Do Not Need a Bigger Boat: Recommendations at Reasonable Scale in a (Mostly) Serverless and Open Stack
Jacopo Tagliabue
29
15
0
15 Jul 2021
Chimera: Efficiently Training Large-Scale Neural Networks with
  Bidirectional Pipelines
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
80
132
0
14 Jul 2021
An Efficient DP-SGD Mechanism for Large Scale NLP Models
An Efficient DP-SGD Mechanism for Large Scale NLP Models
Christophe Dupuy
Radhika Arava
Rahul Gupta
Anna Rumshisky
SyDa
26
35
0
14 Jul 2021
BAGUA: Scaling up Distributed Learning with System Relaxations
BAGUA: Scaling up Distributed Learning with System Relaxations
Shaoduo Gan
Xiangru Lian
Rui Wang
Jianbin Chang
Chengjun Liu
...
Jiawei Jiang
Binhang Yuan
Sen Yang
Ji Liu
Ce Zhang
31
30
0
03 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
29
21
0
02 Jul 2021
Flare: Flexible In-Network Allreduce
Flare: Flexible In-Network Allreduce
Daniele De Sensi
Salvatore Di Girolamo
Saleh Ashkboos
Shigang Li
Torsten Hoefler
30
40
0
29 Jun 2021
BFTrainer: Low-Cost Training of Neural Networks on Unfillable
  Supercomputer Nodes
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Zhengchun Liu
R. Kettimuthu
M. Papka
Ian Foster
34
3
0
22 Jun 2021
Secure Distributed Training at Scale
Secure Distributed Training at Scale
Eduard A. Gorbunov
Alexander Borzunov
Michael Diskin
Max Ryabinin
FedML
26
15
0
21 Jun 2021
Dynamic Gradient Aggregation for Federated Domain Adaptation
Dynamic Gradient Aggregation for Federated Domain Adaptation
Dimitrios Dimitriadis
K. Kumatani
R. Gmyr
Yashesh Gaur
Sefik Emre Eskimez
FedML
26
5
0
14 Jun 2021
Communication-efficient SGD: From Local SGD to One-Shot Averaging
Communication-efficient SGD: From Local SGD to One-Shot Averaging
Artin Spiridonoff
Alexander Olshevsky
I. Paschalidis
FedML
39
20
0
09 Jun 2021
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and
  Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images
Mehdi Cherti
J. Jitsev
LM&MA
24
23
0
31 May 2021
Bridging Data Center AI Systems with Edge Computing for Actionable
  Information Retrieval
Bridging Data Center AI Systems with Edge Computing for Actionable Information Retrieval
Zhengchun Liu
Ahsan Ali
Peter Kenesei
Antonino Miceli
Hemant Sharma
...
Naoufal Layad
Jana Thayer
R. Herbst
Chun Hong Yoon
Ian Foster
27
22
0
28 May 2021
Distributed Multigrid Neural Solvers on Megavoxel Domains
Distributed Multigrid Neural Solvers on Megavoxel Domains
Aditya Balu
Sergio Botelho
Biswajit Khara
Vinay Rao
Chinmay Hegde
Soumik Sarkar
Santi S. Adavani
A. Krishnamurthy
Baskar Ganapathysubramanian
AI4CE
24
11
0
29 Apr 2021
End-to-End Jet Classification of Boosted Top Quarks with the CMS Open
  Data
End-to-End Jet Classification of Boosted Top Quarks with the CMS Open Data
Michael Andrews
Bjorn Burkle
Yi-fan Chen
Davide DiCroce
S. Gleyzer
...
N. Pervan
Yusef Shafi
Wei-Ju Sun
Emanuele Usai
Kun Yang
20
10
0
19 Apr 2021
ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models
  with Huge Embedding Table
ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table
Huifeng Guo
Wei Guo
Yong Gao
Ruiming Tang
Xiuqiang He
Wenzhi Liu
43
20
0
17 Apr 2021
Graph Generative Models for Fast Detector Simulations in High Energy
  Physics
Graph Generative Models for Fast Detector Simulations in High Energy Physics
A. Hariri
Darya Dyachkova
Sergei Gleyzer
AI4CE
26
6
0
05 Apr 2021
Czert -- Czech BERT-like Model for Language Representation
Czert -- Czech BERT-like Model for Language Representation
Jakub Sido
O. Pražák
P. Pribán
Jan Pasek
Michal Seják
Miloslav Konopík
31
43
0
24 Mar 2021
Federated Quantum Machine Learning
Federated Quantum Machine Learning
Samuel Yen-Chi Chen
Shinjae Yoo
FedML
AI4CE
24
117
0
22 Mar 2021
Distributed Deep Learning Using Volunteer Computing-Like Paradigm
Distributed Deep Learning Using Volunteer Computing-Like Paradigm
Medha Atre
B. Jha
Ashwini Rao
23
11
0
16 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
41
46
0
28 Feb 2021
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient
  Descent
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent
Heesu Kim
Hanmin Park
Taehyun Kim
Kwanheum Cho
Eojin Lee
Soojung Ryu
Hyuk-Jae Lee
Kiyoung Choi
Jinho Lee
24
36
0
15 Feb 2021
Previous
1234
Next