ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.08708
  4. Cited By
Synthesizing Optimal Collective Algorithms
v1v2 (latest)

Synthesizing Optimal Collective Algorithms

19 August 2020
Zixian Cai
Zhengyang Liu
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
    GNN
ArXiv (abs)PDFHTML

Papers citing "Synthesizing Optimal Collective Algorithms"

30 / 30 papers shown
Enabling Reconfiguration-Communication Overlap for Collective Communication in Optical Networks
Enabling Reconfiguration-Communication Overlap for Collective Communication in Optical Networks
Changbo Wu
Zhuolong Yu
Gongming Zhao
Hongli Xu
124
0
0
22 Oct 2025
Robust Heuristic Algorithm Design with LLMs
Robust Heuristic Algorithm Design with LLMs
Pantea Karimi
Dany Rouhana
Pooria Namyar
Siva Kesava Reddy Kakarla
Venkat Arun
Behnaz Arzani
74
1
0
09 Oct 2025
Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality
Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality
Daniele De Sensi
Saverio Pasqualoni
Lorenzo Piarulli
Tommaso Bonato
Seydou Ba
M. Turisini
Jens Domke
Torsten Hoefler
181
3
0
24 Aug 2025
Load Balancing for AI Training Workloads
Load Balancing for AI Training Workloads
Sarah McClure
Sylvia Ratnasamy
S. Shenker
Mark Silberstein
Sylvia Ratnasamy
Scott Shenker
Isaac Keslassy
206
2
0
28 Jul 2025
Efficient AllReduce with Stragglers
Efficient AllReduce with Stragglers
Arjun Devraj
Eric Ding
Abhishek Vijaya Kumar
Robert Kleinberg
Rachee Singh
311
0
0
29 May 2025
MSCCL++: Rethinking GPU Communication Abstractions for AI Inference
MSCCL++: Rethinking GPU Communication Abstractions for AI Inference
Aashaka Shah
Abhinav Jangda
Yangqiu Song
Caio Rocha
Changho Hwang
...
Roshan Dathathri
Saeed Maleki
Ziyue Yang
Sreevatsa Anantharamu
Jithin Jose
GNN
510
4
0
11 Apr 2025
Lion Cub: Minimizing Communication Overhead in Distributed Lion
Lion Cub: Minimizing Communication Overhead in Distributed Lion
Satoki Ishikawa
Tal Ben-Nun
B. Van Essen
Rio Yokota
Nikoli Dryden
451
1
0
25 Nov 2024
FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing
FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing
Yu Wang
Junxiao Deng
Minchen Yu
Yue Yu
Yaochen Liu
Hao Fan
Song Wu
Wei Wang
GNN
180
3
0
04 Nov 2024
The Landscape of GPU-Centric Communication
The Landscape of GPU-Centric Communication
Didem Unat
Ilyas Turimbetov
Mohammed Kefah Taha Issa
Doğan Sağbili
Flavio Vella
Daniele De Sensi
Ismayil Ismayilov
343
10
0
15 Sep 2024
HiCCL: A Hierarchical Collective Communication Library
HiCCL: A Hierarchical Collective Communication LibraryIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024
Mert Hidayetoğlu
Simon Garcia De Gonzalo
Elliott Slaughter
Pinku Surana
Wen-mei W. Hwu
William Gropp
Alex Aiken
209
11
0
12 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Yang Liu
388
43
0
29 Jul 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for
  Commodity Processing-in-DIMM Devices
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
263
15
0
13 Apr 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A
  Comprehensive Survey
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
375
27
0
09 Apr 2024
Communication Optimization for Distributed Training: Architecture,
  Advances, and Opportunities
Communication Optimization for Distributed Training: Architecture, Advances, and OpportunitiesIEEE Network (IEEE Netw.), 2024
Yunze Wei
Tianshuo Hu
Cong Liang
Yong Cui
AI4CE
201
7
0
12 Mar 2024
ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics
ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics
Liangyu Zhao
Saeed Maleki
Ziyue Yang
Hossein Pourreza
Aashaka Shah
Changho Hwang
Arvind Krishnamurthy
289
0
0
09 Feb 2024
T3: Transparent Tracking & Triggering for Fine-grained Overlap of
  Compute & Collectives
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
Suchita Pati
Shaizeen Aga
Mahzabeen Islam
Nuwan Jayasena
Matthew D. Sinclair
203
26
0
30 Jan 2024
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
Swing: Short-cutting Rings for Higher Bandwidth AllreduceSymposium on Networked Systems Design and Implementation (NSDI), 2024
Daniele De Sensi
Tommaso Bonato
D. Saam
Torsten Hoefler
339
31
0
17 Jan 2024
Bidirectional Reactive Programming for Machine Learning
Bidirectional Reactive Programming for Machine Learning
D. Potop-Butucaru
Albert Cohen
Gordon Plotkin
Hugo Pompougnac
KELMAI4CE
234
0
0
28 Nov 2023
Efficient All-to-All Collective Communication Schedules for
  Direct-Connect Topologies
Efficient All-to-All Collective Communication Schedules for Direct-Connect TopologiesIEEE International Symposium on High-Performance Parallel Distributed Computing (HPDC), 2023
P. Basu
Liangyu Zhao
Jason Fantl
Siddharth Pal
Arvind Krishnamurthy
J. Khoury
237
13
0
24 Sep 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Tengjiao Wang
287
16
0
17 May 2023
Optimizing Distributed ML Communication with Fused
  Computation-Collective Operations
Optimizing Distributed ML Communication with Fused Computation-Collective OperationsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
Kishore Punniyamurthy
Khaled Hamidouche
Bradford M. Beckmann
FedML
220
26
0
11 May 2023
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
MCR-DL: Mix-and-Match Communication Runtime for Deep LearningIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2023
Quentin G. Anthony
A. A. Awan
Jeff Rasley
Yuxiong He
Hari Subramoni
Mustafa Abduljabbar
Hari Subramoni
D. Panda
MoE
156
8
0
15 Mar 2023
OCCL: a Deadlock-free Library for GPU Collective Communication
OCCL: a Deadlock-free Library for GPU Collective Communication
Lichen Pan
Juncheng Liu
Jinhui Yuan
Rongkai Zhang
Pengze Li
Zhen Xiao
138
2
0
11 Mar 2023
On Optimizing the Communication of Model Parallelism
On Optimizing the Communication of Model ParallelismConference on Machine Learning and Systems (MLSys), 2022
Yonghao Zhuang
Hexu Zhao
Lianmin Zheng
Zhuohan Li
Eric P. Xing
Qirong Ho
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
235
47
0
10 Nov 2022
Efficient Direct-Connect Topologies for Collective Communications
Efficient Direct-Connect Topologies for Collective CommunicationsSymposium on Networked Systems Design and Implementation (NSDI), 2022
Liangyu Zhao
Siddharth Pal
Tapan Chugh
Weiyang Wang
Jason Fantl
P. Basu
J. Khoury
Arvind Krishnamurthy
453
17
0
07 Feb 2022
GC3: An Optimizing Compiler for GPU Collective Communication
GC3: An Optimizing Compiler for GPU Collective Communication
M. Cowan
Saeed Maleki
Madan Musuvathi
Olli Saarikivi
Yifan Xiong
GNN
232
11
0
27 Jan 2022
TACCL: Guiding Collective Algorithm Synthesis using Communication
  Sketches
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Aashaka Shah
Vijay Chidambaram
M. Cowan
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
Rachee Singh
321
114
0
08 Nov 2021
Synthesizing Optimal Parallelism Placement and Reduction Strategies on
  Hierarchical Systems for Deep Learning
Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep LearningConference on Machine Learning and Systems (MLSys), 2021
Ningning Xie
Tamara Norman
Dominik Grewe
Dimitrios Vytiniotis
249
19
0
20 Oct 2021
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for
  Distributed Training of DL Models
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL ModelsInternational Symposium on Computer Architecture (ISCA), 2021
Saeed Rashidi
William Won
Sudarshan Srinivasan
Srinivas Sridharan
T. Krishna
GNN
334
51
0
09 Oct 2021
Software-Hardware Co-design for Fast and Scalable Training of Deep
  Learning Recommendation Models
Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation ModelsInternational Symposium on Computer Architecture (ISCA), 2021
Dheevatsa Mudigere
Y. Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
...
Ajit Mathews
Lin Qiao
M. Smelyanskiy
Bill Jia
Vijay Rao
665
187
0
12 Apr 2021
1
Page 1 of 1