ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.06942
  4. Cited By
Optimizing Distributed ML Communication with Fused
  Computation-Collective Operations

Optimizing Distributed ML Communication with Fused Computation-Collective Operations

11 May 2023
Kishore Punniyamurthy
Khaled Hamidouche
Bradford M. Beckmann
    FedML
ArXivPDFHTML

Papers citing "Optimizing Distributed ML Communication with Fused Computation-Collective Operations"

4 / 4 papers shown
Title
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
Haiyue Ma
Jian Liu
Ronny Krashinsky
23
0
0
10 Oct 2024
Domino: Eliminating Communication in LLM Training via Generic Tensor
  Slicing and Overlapping
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Guanhua Wang
Chengming Zhang
Zheyu Shen
Ang Li
Olatunji Ruwase
36
3
0
23 Sep 2024
The Landscape of GPU-Centric Communication
The Landscape of GPU-Centric Communication
D. Unat
Ilyas Turimbetov
Mohammed Kefah Taha Issa
Doğan Sağbili
Flavio Vella
Daniele De Sensi
Ismayil Ismayilov
28
2
0
15 Sep 2024
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1