ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.05799
  4. Cited By
Horovod: fast and easy distributed deep learning in TensorFlow
v1v2v3 (latest)

Horovod: fast and easy distributed deep learning in TensorFlow

15 February 2018
Alexander Sergeev
Mike Del Balso
ArXiv (abs)PDFHTMLGithub (14494★)

Papers citing "Horovod: fast and easy distributed deep learning in TensorFlow"

50 / 454 papers shown
Title
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
Alex Iacob
Lorenzo Sani
M. Safaryan
Paris Giampouras
Samuel Horváth
...
Meghdad Kurmanji
Preslav Aleksandrov
William F. Shen
Xinchi Qiu
Nicholas D. Lane
OffRL
88
0
0
28 May 2025
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
Juntao Zhao
Qi Lu
Wei Jia
Borui Wan
Lei Zuo
...
Size Zheng
Yanghua Peng
H. Lin
Xin Liu
Chuan Wu
AI4CE
139
0
0
14 Apr 2025
Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints
Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints
Yuhao Zhou
Yuxin Tian
Jindi Lv
Mingjia Shi
Yuanxi Li
Qing Ye
Shuhao Zhang
Jiancheng Lv
CLL
112
0
0
15 Mar 2025
Weak Supervision for Improved Precision in Search Systems
Sriram Vasudevan
NoLa
87
0
0
10 Mar 2025
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
Hao Ge
Junda Feng
Qi Huang
Fangcheng Fu
Xiaonan Nie
Lei Zuo
Yanghua Peng
Tengjiao Wang
Xin Liu
104
2
0
28 Feb 2025
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
Ziyue Luo
Jia-Wei Liu
Myungjin Lee
Ness B. Shroff
79
0
0
09 Jan 2025
Hiding Communication Cost in Distributed LLM Training via Micro-batch
  Co-execution
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
159
1
0
24 Nov 2024
Photon: Federated LLM Pre-Training
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
88
9
0
05 Nov 2024
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer
  Models
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
Runsheng Benson Guo
Utkarsh Anand
Arthur Chen
Khuzaima Daudjee
59
1
0
01 Nov 2024
A Novel Breast Ultrasound Image Augmentation Method Using Advanced
  Neural Style Transfer: An Efficient and Explainable Approach
A Novel Breast Ultrasound Image Augmentation Method Using Advanced Neural Style Transfer: An Efficient and Explainable Approach
Lipismita Panigrahi
Prianka Rani Saha
Jurdana Masuma Iqrah
Sushil Prasad
MedIm
41
0
0
31 Oct 2024
Deep Optimizer States: Towards Scalable Training of Transformer Models
  Using Interleaved Offloading
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
78
1
0
26 Oct 2024
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale
  Models via Malleable Data and Model Parallelization
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
Haoyang Li
Fangcheng Fu
Hao Ge
Sheng Lin
Xuanyu Wang
Jiawen Niu
Yijiao Wang
Hailin Zhang
Xiaonan Nie
Tengjiao Wang
MoMe
92
2
0
17 Oct 2024
From promise to practice: realizing high-performance decentralized
  training
From promise to practice: realizing high-performance decentralized training
Zesen Wang
Jiaojiao Zhang
Xuyang Wu
M. Johansson
110
0
0
15 Oct 2024
Breaking the mold: The challenge of large scale MARL specialization
Breaking the mold: The challenge of large scale MARL specialization
Stefan Juang
Hugh Cao
Arielle Zhou
Ruochen Liu
Nevin L. Zhang
Elvis Liu
54
1
0
03 Oct 2024
HybridFlow: A Flexible and Efficient RLHF Framework
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
225
240
0
28 Sep 2024
Domino: Eliminating Communication in LLM Training via Generic Tensor
  Slicing and Overlapping
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Guanhua Wang
Chengming Zhang
Zheyu Shen
Ang Li
Olatunji Ruwase
60
4
0
23 Sep 2024
Performance and Power: Systematic Evaluation of AI Workloads on
  Accelerators with CARAML
Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML
Chelsea Maria John
Stepan Nassyr
Carolin Penke
A. Herten
52
0
0
19 Sep 2024
Revisiting the Time Cost Model of AllReduce
Revisiting the Time Cost Model of AllReduce
Dian Xiong
Li Chen
Youhe Jiang
Dan Li
Shuai Wang
Songtao Wang
42
0
0
06 Sep 2024
Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for
  Collaborative DNN Training on Heterogeneous Edge Devices
Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices
Shengyuan Ye
Liekang Zeng
Xiaowen Chu
Guoliang Xing
Xu Chen
97
12
0
15 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
159
13
0
29 Jul 2024
On the Performance and Memory Footprint of Distributed Training: An
  Empirical Study on Transformers
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers
Zhengxian Lu
Fangyu Wang
Zhiwei Xu
Fei Yang
Tao Li
70
1
0
02 Jul 2024
Hybrid Approach to Parallel Stochastic Gradient Descent
Hybrid Approach to Parallel Stochastic Gradient Descent
Aakash Sudhirbhai Vora
Dhrumil Chetankumar Joshi
Aksh Kantibhai Patel
30
0
0
27 Jun 2024
Scalable Artificial Intelligence for Science: Perspectives, Methods and
  Exemplars
Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars
Wesley Brewer
Aditya Kashi
Sajal Dash
A. Tsaris
Junqi Yin
Mallikarjun Shankar
Feiyi Wang
87
0
0
24 Jun 2024
AI-coupled HPC Workflow Applications, Middleware and Performance
AI-coupled HPC Workflow Applications, Middleware and Performance
Wes Brewer
Ana Gainaru
Frédéric Suter
Feiyi Wang
M. Emani
S. Jha
84
10
0
20 Jun 2024
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
Daniel Lersch
Malachi Schram
Zhenyu Dai
Kishansingh Rajput
Xingfu Wu
Nobuo Sato
J. T. Childers
67
0
0
11 Jun 2024
Training Through Failure: Effects of Data Consistency in Parallel
  Machine Learning Training
Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training
Ray Cao
Sherry Luo
Steve Gan
Sujeeth Jinesh
61
0
0
08 Jun 2024
Efficient Data-Parallel Continual Learning with Asynchronous Distributed
  Rehearsal Buffers
Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers
Thomas Bouvier
Bogdan Nicolae
Hugo Chaugier
Alexandru Costan
Ian Foster
Gabriel Antoniu
76
1
0
05 Jun 2024
Full-Stack Allreduce on Multi-Rail Networks
Full-Stack Allreduce on Multi-Rail Networks
Enda Yu
Dezun Dong
Xiangke Liao
GNN
72
0
0
28 May 2024
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ
  Transformer Inference
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
Shengyuan Ye
Jiangsu Du
Liekang Zeng
Wenzhong Ou
Xiaowen Chu
Yutong Lu
Xu Chen
69
20
0
27 May 2024
Apply Distributed CNN on Genomics to accelerate Transcription-Factor
  TAL1 Motif Prediction
Apply Distributed CNN on Genomics to accelerate Transcription-Factor TAL1 Motif Prediction
Tasnim Assali
Zayneb Trabelsi Ayoub
Sofiane Ouni
GNNAI4CE
20
2
0
25 May 2024
Worldwide Federated Training of Language Models
Worldwide Federated Training of Language Models
Alexandru Iacob
Lorenzo Sani
Bill Marino
Preslav Aleksandrov
William F. Shen
Nicholas D. Lane
FedML
79
2
0
23 May 2024
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs
  Amid Failures
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures
Swapnil Gandhi
Mark Zhao
Athinagoras Skiadopoulos
Christos Kozyrakis
AI4CEGNN
64
1
0
22 May 2024
Efficiency optimization of large-scale language models based on deep
  learning in natural language processing tasks
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks
Taiyuan Mei
Yun Zi
X. Cheng
Zijun Gao
Qi Wang
Haowei Yang
115
20
0
20 May 2024
The Future of Large Language Model Pre-training is Federated
The Future of Large Language Model Pre-training is Federated
Lorenzo Sani
Alexandru Iacob
Zeyu Cao
Bill Marino
Yan Gao
...
Wanru Zhao
William F. Shen
Preslav Aleksandrov
Xinchi Qiu
Nicholas D. Lane
AI4CE
161
21
0
17 May 2024
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
Ye Tian
Zhen Jia
Ziyue Luo
Yida Wang
Chuan Wu
AI4CE
46
4
0
02 May 2024
MalleTrain: Deep Neural Network Training on Unfillable Supercomputer
  Nodes
MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes
Xiaolong Ma
Feng Yan
Lei Yang
Ian Foster
M. Papka
Zhengchun Liu
R. Kettimuthu
LRM
41
7
0
24 Apr 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
109
0
0
16 Apr 2024
AntDT: A Self-Adaptive Distributed Training Framework for Leader and
  Straggler Nodes
AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes
Youshao Xiao
Lin Ju
Zhenglei Zhou
Siyuan Li
Zhaoxin Huan
...
Rujie Jiang
Lin Wang
Xiaolu Zhang
Lei Liang
Jun Zhou
59
1
0
15 Apr 2024
On the Efficiency of Privacy Attacks in Federated Learning
On the Efficiency of Privacy Attacks in Federated Learning
Nawrin Tabassum
Ka-Ho Chow
Xuyu Wang
Wenbin Zhang
Yanzhao Wu
FedML
65
1
0
15 Apr 2024
pfl-research: simulation framework for accelerating research in Private
  Federated Learning
pfl-research: simulation framework for accelerating research in Private Federated Learning
Filip Granqvist
Congzheng Song
Áine Cahill
Rogier van Dalen
Martin Pelikan
Yi Sheng Chan
Xiaojun Feng
Natarajan Krishnaswami
Vojta Jina
Mona Chitnis
FedML
86
6
0
09 Apr 2024
A Survey on Error-Bounded Lossy Compression for Scientific Datasets
A Survey on Error-Bounded Lossy Compression for Scientific Datasets
Sheng Di
Jinyang Liu
Kai Zhao
Xin Liang
Robert Underwood
...
Jon C. Calhoun
Guanpeng Li
Kazutomo Yoshii
Khalid Ayed Alharthi
Franck Cappello
AI4CE
95
16
0
03 Apr 2024
CATGNN: Cost-Efficient and Scalable Distributed Training for Graph
  Neural Networks
CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks
Xin Huang
Weipeng Zhuo
Minh Phu Vuong
Shiju Li
Jongryool Kim
Bradley Rees
Chul-Ho Lee
GNN
61
0
0
02 Apr 2024
Satellite Federated Edge Learning: Architecture Design and Convergence
  Analysis
Satellite Federated Edge Learning: Architecture Design and Convergence Analysis
Yuanming Shi
Li Zeng
Jingyang Zhu
Yong Zhou
Chunxiao Jiang
Khaled B. Letaief
69
15
0
02 Apr 2024
Union: An Automatic Workload Manager for Accelerating Network Simulation
Union: An Automatic Workload Manager for Accelerating Network Simulation
Xin Wang
Misbah Mubarak
Yao Kang
R. Ross
Z. Lan
41
12
0
25 Mar 2024
Study of Workload Interference with Intelligent Routing on Dragonfly
Study of Workload Interference with Intelligent Routing on Dragonfly
Yao Kang
Xin Wang
Z. Lan
63
10
0
24 Mar 2024
A Parallel Workflow for Polar Sea-Ice Classification using Auto-labeling
  of Sentinel-2 Imagery
A Parallel Workflow for Polar Sea-Ice Classification using Auto-labeling of Sentinel-2 Imagery
Jurdana Masuma Iqrah
Wei Wang
Hongjie Xie
Sushil Prasad
25
1
0
19 Mar 2024
Edge-Disjoint Spanning Trees on Star-Product Networks
Edge-Disjoint Spanning Trees on Star-Product Networks
Aleyah Dawkins
K. Isham
Aleš Kubíček
Kartik Lakhotia
Laura Monroe
33
0
0
18 Mar 2024
ATOM: Asynchronous Training of Massive Models for Deep Learning in a
  Decentralized Environment
ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment
Xiaofeng Wu
Jia Rao
Wei Chen
51
3
0
15 Mar 2024
DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep
  Learning Clusters in the Cloud
DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep Learning Clusters in the Cloud
Yoochan Kim
Kihyun Kim
Yonghyeon Cho
Jinwoo Kim
Awais Khan
Ki-Dong Kang
B. An
Myung-Hoon Cha
H. Kim
Youngjae Kim
65
3
0
09 Mar 2024
ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics
ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics
Liangyu Zhao
Saeed Maleki
Ziyue Yang
Hossein Pourreza
Aashaka Shah
60
0
0
09 Feb 2024
1234...8910
Next