ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.03605
  4. Cited By
Highly Available Data Parallel ML training on Mesh Networks

Highly Available Data Parallel ML training on Mesh Networks

6 November 2020
Sameer Kumar
N. Jouppi
    MoE
    AI4CE
ArXivPDFHTML

Papers citing "Highly Available Data Parallel ML training on Mesh Networks"

4 / 4 papers shown
Title
FRED: Flexible REduction-Distribution Interconnect and Communication
  Implementation for Wafer-Scale Distributed Training of DNN Models
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Puneet Gupta
Tushar Krishna
30
0
0
28 Jun 2024
Near-Optimal Wafer-Scale Reduce
Near-Optimal Wafer-Scale Reduce
Piotr Luczynski
Lukas Gianinazzi
Patrick Iff
Leighton Wilson
Daniele De Sensi
Torsten Hoefler
40
2
0
24 Apr 2024
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed
  Machine Learning
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning
William Won
Suvinay Subramanian
Sudarshan Srinivasan
A. Durg
Samvit Kaul
Swati Gupta
Tushar Krishna
37
6
0
11 Apr 2023
On the Generalization Mystery in Deep Learning
On the Generalization Mystery in Deep Learning
S. Chatterjee
Piotr Zielinski
OOD
28
33
0
18 Mar 2022
1