Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.04940
Cited By
Blink: Fast and Generic Collectives for Distributed ML
11 October 2019
Guanhua Wang
Shivaram Venkataraman
Amar Phanishayee
J. Thelin
Nikhil R. Devanur
Ion Stoica
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Blink: Fast and Generic Collectives for Distributed ML"
14 / 14 papers shown
Title
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
34
7
0
13 Apr 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
37
6
0
09 Apr 2024
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
Efficient All-reduce for Distributed DNN Training in Optical Interconnect System
Fei Dai
Yawen Chen
Zhiyi Huang
Haibo Zhang
Fangfang Zhang
9
7
0
22 Jul 2022
Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Tarannum Khan
Saeed Rashidi
Srinivas Sridharan
Pallavi Shurpali
Aditya Akella
T. Krishna
OOD
34
11
0
22 Jul 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
19
39
0
30 Apr 2022
Efficient Direct-Connect Topologies for Collective Communications
Liangyu Zhao
Siddharth Pal
Tapan Chugh
Weiyang Wang
Jason Fantl
P. Basu
J. Khoury
Arvind Krishnamurthy
42
6
0
07 Feb 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
39
81
0
01 Feb 2022
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Srinivas Sridharan
T. Krishna
GNN
30
29
0
09 Oct 2021
Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data
Xiaodong Yu
Viktor V. Nikitin
Daniel J. Ching
Selin S. Aslan
D. Gursoy
Tekin Bicer
24
19
0
14 Jun 2021
Synthesizing Optimal Collective Algorithms
Zixian Cai
Zhengyang Liu
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
GNN
26
59
0
19 Aug 2020
Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems
Siyuan Zhuang
Zhuohan Li
Danyang Zhuo
Stephanie Wang
Eric Liang
Robert Nishihara
Philipp Moritz
Ion Stoica
22
23
0
13 Feb 2020
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Lifu Zhang
T. Abdelrahman
21
0
0
29 Dec 2019
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
16
23
0
26 Jul 2019
1