ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.04940
  4. Cited By
Blink: Fast and Generic Collectives for Distributed ML

Blink: Fast and Generic Collectives for Distributed ML

11 October 2019
Guanhua Wang
Shivaram Venkataraman
Amar Phanishayee
J. Thelin
Nikhil R. Devanur
Ion Stoica
    VLM
ArXivPDFHTML

Papers citing "Blink: Fast and Generic Collectives for Distributed ML"

14 / 14 papers shown
Title
PID-Comm: A Fast and Flexible Collective Communication Framework for
  Commodity Processing-in-DIMM Devices
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
34
7
0
13 Apr 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A
  Comprehensive Survey
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
37
6
0
09 Apr 2024
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
Efficient All-reduce for Distributed DNN Training in Optical
  Interconnect System
Efficient All-reduce for Distributed DNN Training in Optical Interconnect System
Fei Dai
Yawen Chen
Zhiyi Huang
Haibo Zhang
Fangfang Zhang
11
7
0
22 Jul 2022
Impact of RoCE Congestion Control Policies on Distributed Training of
  DNNs
Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Tarannum Khan
Saeed Rashidi
Srinivas Sridharan
Pallavi Shurpali
Aditya Akella
T. Krishna
OOD
34
11
0
22 Jul 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
19
39
0
30 Apr 2022
Efficient Direct-Connect Topologies for Collective Communications
Efficient Direct-Connect Topologies for Collective Communications
Liangyu Zhao
Siddharth Pal
Tapan Chugh
Weiyang Wang
Jason Fantl
P. Basu
J. Khoury
Arvind Krishnamurthy
42
6
0
07 Feb 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for
  Distributed Training Jobs
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
39
81
0
01 Feb 2022
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for
  Distributed Training of DL Models
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Srinivas Sridharan
T. Krishna
GNN
30
29
0
09 Oct 2021
Scalable and accurate multi-GPU based image reconstruction of
  large-scale ptychography data
Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data
Xiaodong Yu
Viktor V. Nikitin
Daniel J. Ching
Selin S. Aslan
D. Gursoy
Tekin Bicer
27
19
0
14 Jun 2021
Synthesizing Optimal Collective Algorithms
Synthesizing Optimal Collective Algorithms
Zixian Cai
Zhengyang Liu
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
GNN
26
59
0
19 Aug 2020
Hoplite: Efficient and Fault-Tolerant Collective Communication for
  Task-Based Distributed Systems
Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems
Siyuan Zhuang
Zhuohan Li
Danyang Zhuo
Stephanie Wang
Eric Liang
Robert Nishihara
Philipp Moritz
Ion Stoica
24
23
0
13 Feb 2020
Pipelined Training with Stale Weights of Deep Convolutional Neural
  Networks
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Lifu Zhang
T. Abdelrahman
21
0
0
29 Dec 2019
Taming Momentum in a Distributed Asynchronous Environment
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
19
23
0
26 Jul 2019
1