v1v2 (latest)

Synthesizing Optimal Collective Algorithms

19 August 2020

Papers citing "Synthesizing Optimal Collective Algorithms"

30 / 30 papers shown

Enabling Reconfiguration-Communication Overlap for Collective Communication in Optical Networks

124

22 Oct 2025

Robust Heuristic Algorithm Design with LLMs

Pantea Karimi

Dany Rouhana

Pooria Namyar

Siva Kesava Reddy Kakarla

Venkat Arun

Behnaz Arzani

09 Oct 2025

Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality

181

24 Aug 2025

Load Balancing for AI Training Workloads

Sarah McClure

Sylvia Ratnasamy

S. Shenker

Mark Silberstein

Sylvia Ratnasamy

Scott Shenker

Isaac Keslassy

206

28 Jul 2025

Efficient AllReduce with Stragglers

Arjun Devraj

Eric Ding

Abhishek Vijaya Kumar

Robert Kleinberg

Rachee Singh

311

29 May 2025

MSCCL++: Rethinking GPU Communication Abstractions for AI Inference

...

Sreevatsa Anantharamu

Jithin Jose

GNN

510

11 Apr 2025

Lion Cub: Minimizing Communication Overhead in Distributed Lion

451

25 Nov 2024

FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing

180

04 Nov 2024

The Landscape of GPU-Centric Communication

Didem Unat

Ilyas Turimbetov

Mohammed Kefah Taha Issa

Doğan Sağbili

Flavio Vella

Daniele De Sensi

Ismayil Ismayilov

343

15 Sep 2024

HiCCL: A Hierarchical Collective Communication LibraryIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024

Mert Hidayetoğlu

Simon Garcia De Gonzalo

Alex Aiken

209

12 Aug 2024

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

...

Dahua Lin

Yonggang Wen

Xin Jin

Tianwei Zhang

Yang Liu

388

29 Jul 2024

PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices

263

13 Apr 2024

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Xiping Hu

375

09 Apr 2024

Communication Optimization for Distributed Training: Architecture, Advances, and OpportunitiesIEEE Network (IEEE Netw.), 2024

201

12 Mar 2024

ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics

289

09 Feb 2024

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives

Suchita Pati

Shaizeen Aga

Mahzabeen Islam

Nuwan Jayasena

Matthew D. Sinclair

203

30 Jan 2024

Swing: Short-cutting Rings for Higher Bandwidth AllreduceSymposium on Networked Systems Design and Implementation (NSDI), 2024

339

17 Jan 2024

Bidirectional Reactive Programming for Machine Learning

234

28 Nov 2023

Efficient All-to-All Collective Communication Schedules for Direct-Connect TopologiesIEEE International Symposium on High-Performance Parallel Distributed Computing (HPDC), 2023

237

24 Sep 2023

OSDP: Optimal Sharded Data Parallel for Distributed Deep LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

Youhe Jiang

Fangcheng Fu

Xupeng Miao

Xiaonan Nie

Tengjiao Wang

287

17 May 2023

Optimizing Distributed ML Communication with Fused Computation-Collective OperationsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023

Kishore Punniyamurthy

Khaled Hamidouche

Bradford M. Beckmann

FedML

220

11 May 2023

MCR-DL: Mix-and-Match Communication Runtime for Deep LearningIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2023

Yuxiong He

Hari Subramoni

156

15 Mar 2023

OCCL: a Deadlock-free Library for GPU Collective Communication

138

11 Mar 2023

On Optimizing the Communication of Model ParallelismConference on Machine Learning and Systems (MLSys), 2022

235

10 Nov 2022

Efficient Direct-Connect Topologies for Collective CommunicationsSymposium on Networked Systems Design and Implementation (NSDI), 2022

453

07 Feb 2022

GC3: An Optimizing Compiler for GPU Collective Communication

232

27 Jan 2022

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

321

114

08 Nov 2021

Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep LearningConference on Machine Learning and Systems (MLSys), 2021

249

20 Oct 2021

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL ModelsInternational Symposium on Computer Architecture (ISCA), 2021

334

09 Oct 2021

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation ModelsInternational Symposium on Computer Architecture (ISCA), 2021

...

665

187

12 Apr 2021