Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.03605
Cited By
Highly Available Data Parallel ML training on Mesh Networks
6 November 2020
Sameer Kumar
N. Jouppi
MoE
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Highly Available Data Parallel ML training on Mesh Networks"
4 / 4 papers shown
Title
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Puneet Gupta
Tushar Krishna
30
0
0
28 Jun 2024
Near-Optimal Wafer-Scale Reduce
Piotr Luczynski
Lukas Gianinazzi
Patrick Iff
Leighton Wilson
Daniele De Sensi
Torsten Hoefler
40
2
0
24 Apr 2024
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning
William Won
Suvinay Subramanian
Sudarshan Srinivasan
A. Durg
Samvit Kaul
Swati Gupta
Tushar Krishna
37
6
0
11 Apr 2023
On the Generalization Mystery in Deep Learning
S. Chatterjee
Piotr Zielinski
OOD
28
33
0
18 Mar 2022
1