v1v2v3 (latest)

Horovod: fast and easy distributed deep learning in TensorFlow

15 February 2018

Papers citing "Horovod: fast and easy distributed deep learning in TensorFlow"

50 / 454 papers shown

Title
Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs Qiang-qiang Wang Shaoshuai Shi Canhui Wang Xiaowen Chu 70 13 0 24 Feb 2020
Communication-Efficient Edge AI: Algorithms and Systems Yuanming Shi Kai Yang Tao Jiang Jun Zhang Khaled B. Letaief GNN 99 334 0 22 Feb 2020
STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage Ali Heydarigorji Mahdi Torabzadehkashi Siavash Rezaei Hossein Bobarshad V. Alves Pai H. Chou BDL 31 5 0 17 Feb 2020
Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems Siyuan Zhuang Zhuohan Li Danyang Zhuo Stephanie Wang Eric Liang Robert Nishihara Philipp Moritz Ion Stoica 40 24 0 13 Feb 2020
Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent Giorgi Nadiradze Ilia Markov Bapi Chatterjee Vyacheslav Kungurtsev Dan Alistarh FedML 121 14 0 16 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well Vipul Gupta S. Serrano D. DeCoste MoMe 83 60 0 07 Jan 2020
High-Performance Statistical Computing in the Computing Environments of the 2020s Seyoon Ko Hua Zhou Jin J. Zhou Joong-Ho Won 55 8 0 07 Jan 2020
Attention based on-device streaming speech recognition with large speech corpus Kwangyoun Kim Kyungmin Lee Dhananjaya N. Gowda Junmo Park Sungsoo Kim ... Daehyun Kim Seokyeong Jung Jungin Lee Myoungji Han Chanwoo Kim 55 58 0 02 Jan 2020
end-to-end training of a large vocabulary end-to-end speech recognition system Chanwoo Kim Sungsoo Kim Kwangyoun Kim Mehul Kumar Jiyeon Kim ... Eunhyang Kim Minkyoo Shin Shatrughan Singh Larry Heck Dhananjaya N. Gowda 61 27 0 22 Dec 2019
A Survey on Distributed Machine Learning Joost Verbraeken Matthijs Wolting Jonathan Katzy Jeroen Kloppenburg Tim Verbelen Jan S. Rellermeyer OOD 122 715 0 20 Dec 2019
Mastering Complex Control in MOBA Games with Deep Reinforcement Learning Deheng Ye Zhao Liu Mingfei Sun Bei Shi P. Zhao ... Tengfei Shi Liang Wang Qiang Fu Wei Yang Lanxiao Huang 67 324 0 20 Dec 2019
C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation Qihang Yu Dong Yang H. Roth Yutong Bai Yixiao Zhang Alan Yuille Daguang Xu 110 109 0 20 Dec 2019
PolyTransform: Deep Polygon Transformer for Instance Segmentation Justin Liang N. Homayounfar Wei-Chiu Ma Yuwen Xiong Rui Hu R. Urtasun ViT ISeg 121 177 0 05 Dec 2019
Simulation-based reinforcement learning for real-world autonomous driving B. Osinski Adam Jakubowski Piotr Milos Pawel Ziecina Christopher Galias S. Homoceanu Henryk Michalewski 112 122 0 29 Nov 2019
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates Cong Xie Oluwasanmi Koyejo Indranil Gupta Yanghua Peng 66 42 0 20 Nov 2019
MetH: A family of high-resolution and variable-shape image challenges Ferran Parés Dario Garcia-Gasulla Harald Servat Jesús Labarta Eduard Ayguadé 44 0 0 20 Nov 2019
Auto-Precision Scaling for Distributed Deep Learning Ruobing Han J. Demmel Yang You 43 5 0 20 Nov 2019
Understanding Top-k Sparsification in Distributed Deep Learning Shaoshuai Shi Xiaowen Chu Ka Chun Cheung Simon See 233 101 0 20 Nov 2019
Label-similarity Curriculum Learning Ürün Dogan A. Deshmukh Marcin Machura Christian Igel 61 21 0 15 Nov 2019
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow A. A. Awan Arpan Jain Quentin G. Anthony Hari Subramoni Dhabaleswar K. Panda MoE AI4CE 55 5 0 12 Nov 2019
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames Erik Wijmans Abhishek Kadian Ari S. Morcos Stefan Lee Irfan Essa Devi Parikh Manolis Savva Dhruv Batra 97 484 0 01 Nov 2019
Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs Liu Yang Sean Treichler Thorsten Kurth Keno Fischer D. Barajas-Solano ... Valentin Churavy A. Tartakovsky Michael Houston P. Prabhat George Karniadakis AI4CE 87 39 0 29 Oct 2019
Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks Davit Buniatyan GNN 50 5 0 16 Oct 2019
Parallelized Training of Restricted Boltzmann Machines using Markov-Chain Monte Carlo Methods Pei Yang S. Varadharajan Lucas A. Wilson Don D. Smith John A. Lockman Vineet Gundecha Quy Ta BDL 28 1 0 14 Oct 2019
Blink: Fast and Generic Collectives for Distributed ML Guanhua Wang Shivaram Venkataraman Amar Phanishayee J. Thelin Nikhil R. Devanur Ion Stoica VLM 65 142 0 11 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet Training John Shelton Hyatt Cameron R. Wolfe Michael Lee Yuxin Tang Anastasios Kyrillidis Christopher M. Jermaine OOD 92 39 0 04 Oct 2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models Samyam Rajbhandari Jeff Rasley Olatunji Ruwase Yuxiong He ALM AI4CE 92 923 0 04 Oct 2019
Training Multiscale-CNN for Large Microscopy Image Classification in One Hour Kushal Datta Imtiaz Hossain Sun Choi V. Saletore Kyle H. Ambert William J. Godinez Xian Zhang 19 4 0 03 Oct 2019
Accelerating Data Loading in Deep Neural Network Training Chih-Chieh Yang Guojing Cong 78 38 0 02 Oct 2019
Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos Ji Lin Chuang Gan Song Han 78 10 0 01 Oct 2019
Deep learning at scale for subgrid modeling in turbulent flows Mathis Bode M. Gauding K. Kleinheinz H. Pitsch AI4CE 66 21 0 01 Oct 2019
Elastic deep learning in multi-tenant GPU cluster Yidi Wu Kaihao Ma Xiao Yan Zhi Liu Zhenkun Cai Yuzhen Huang James Cheng Han Yuan Fan Yu 25 2 0 26 Sep 2019
Exascale Deep Learning for Scientific Inverse Problems N. Laanait Josh Romero Junqi Yin M. T. Young Sean Treichler V. Starchenko A. Borisevich Alexander Sergeev Michael A. Matheson FedML BDL 68 29 0 24 Sep 2019
A Random Gossip BMUF Process for Neural Language Modeling Yiheng Huang Jinchuan Tian Lei Han Guangsen Wang Xingcheng Song Jane Polak Scowcroft Dong Yu 37 3 0 19 Sep 2019
From Server-Based to Client-Based Machine Learning: A Comprehensive Survey Renjie Gu Chaoyue Niu Fan Wu Guihai Chen Chun Hu Chengfei Lyu Zhihua Wu 97 26 0 18 Sep 2019
Heterogeneity-Aware Asynchronous Decentralized Training Qinyi Luo Jiaao He Youwei Zhuo Xuehai Qian 62 8 0 17 Sep 2019
FfDL : A Flexible Multi-tenant Deep Learning Platform K.R. Jayaram Vinod Muthusamy Parijat Dube Vatche Isahagian Chen Wang ... Diana Arroyo Asser Tantawi Archit Verma Falk Pollok Rania Y. Khalaf VLM 38 21 0 14 Sep 2019
Accelerating Training using Tensor Decomposition Mostafa Elhoushi Ye Tian Zihao Chen F. Shafiq Joey Yiwei Li 39 3 0 10 Sep 2019
Performance Analysis and Comparison of Distributed Machine Learning Systems S. Alqahtani Murat Demirbas 38 26 0 04 Sep 2019
Guardians of the Deep Fog: Failure-Resilient DNN Inference from Edge to Cloud Ashkan Yousefpour Siddartha Devic Brian Q. Nguyen Aboudy Kreidieh Alan Liao Alexandre M. Bayen J. Jue FedML GNN 64 24 0 03 Sep 2019
NEZHA: Neural Contextualized Representation for Chinese Language Understanding Junqiu Wei Xiaozhe Ren Xiaoguang Li Wenyong Huang Yi-Lun Liao Yasheng Wang Jianghao Lin Xin Jiang Xiao Chen Qun Liu 82 116 0 31 Aug 2019
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML Ashvin Agrawal Rony Chatterjee Carlo Curino Avrilia Floratou Neha Godwal ... Karla Saur Rathijit Sen Markus Weimer Travis Wright Yiwen Zhu 135 40 0 30 Aug 2019
TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir S. Samsi Michael Houle 24 4 0 29 Aug 2019
Distributed Deep Learning for Precipitation Nowcasting S. Samsi Christopher J. Mattioli Mark S. Veillette 77 23 0 28 Aug 2019
Dynamic Scheduling of MPI-based Distributed Deep Learning Training Jobs Tim Capes Vishal Raheja Mete Kemertas Iqbal Mohomed AI4CE 18 3 0 21 Aug 2019
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations Shigang Li Tal Ben-Nun Salvatore Di Girolamo Dan Alistarh Torsten Hoefler 147 59 0 12 Aug 2019
Chainer: A Deep Learning Framework for Accelerating the Research Cycle Seiya Tokui Ryosuke Okuta Takuya Akiba Yusuke Niitani Toru Ogawa Shunta Saito Shuji Suzuki Kota Uenishi Brian K. Vogel Hiroyuki Yamazaki Vincent BDL AI4CE 84 130 0 01 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training Saptadeep Pal Eiman Ebrahimi A. Zulfiqar Yaosheng Fu Victor Zhang Szymon Migacz D. Nellans Puneet Gupta 92 59 0 30 Jul 2019
HPC AI500: A Benchmark Suite for HPC AI Systems Zihan Jiang Wanling Gao Lei Wang Xingwang Xiong Yuchen Zhang ... Yunquan Zhang Shengzhong Feng KenLi Li Weijia Xu Jianfeng Zhan ELM 68 40 0 27 Jul 2019
PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch Liang Lu Xiong Xiao Zhuo Chen Jiawei Liu 90 14 0 12 Jul 2019