Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.00852
Cited By
CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters
1 August 2023
S. Rajasekaran
M. Ghobadi
Aditya Akella
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters"
13 / 13 papers shown
Title
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
50
89
0
01 Feb 2022
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Aashaka Shah
Vijay Chidambaram
M. Cowan
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
Rachee Singh
35
57
0
08 Nov 2021
Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models
Dheevatsa Mudigere
Y. Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
...
Ajit Mathews
Lin Qiao
M. Smelyanskiy
Bill Jia
Vijay Rao
72
152
0
12 Apr 2021
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
50
181
0
27 Aug 2020
CamemBERT: a Tasty French Language Model
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
96
970
0
10 Nov 2019
Blink: Fast and Generic Collectives for Distributed ML
Guanhua Wang
Shivaram Venkataraman
Amar Phanishayee
J. Thelin
Nikhil R. Devanur
Ion Stoica
VLM
44
138
0
11 Oct 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
310
1,892
0
17 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
514
24,351
0
26 Jul 2019
Deep Learning Recommendation Model for Personalization and Recommendation Systems
Maxim Naumov
Dheevatsa Mudigere
Hao-Jun Michael Shi
Jianyu Huang
Narayanan Sundaraman
...
Wenlin Chen
Vijay Rao
Bill Jia
Liang Xiong
M. Smelyanskiy
85
732
0
31 May 2019
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Peng Sun
Wansen Feng
Ruobing Han
Shengen Yan
Yonggang Wen
AI4CE
59
70
0
19 Feb 2019
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
80
410
0
08 Nov 2018
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
...
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
119
1,768
0
05 Dec 2017
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
306
7,971
0
23 May 2016
1