Optimizing Distributed ML Communication with Fused Computation-Collective Operations

11 May 2023

Papers citing "Optimizing Distributed ML Communication with Fused Computation-Collective Operations"

4 / 4 papers shown

Title
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM Haiyue Ma Jian Liu Ronny Krashinsky 23 0 0 10 Oct 2024
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Guanhua Wang Chengming Zhang Zheyu Shen Ang Li Olatunji Ruwase 36 3 0 23 Sep 2024
The Landscape of GPU-Centric Communication D. Unat Ilyas Turimbetov Mohammed Kefah Taha Issa Doğan Sağbili Flavio Vella Daniele De Sensi Ismayil Ismayilov 28 2 0 15 Sep 2024
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019