Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.09213
Cited By
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
20 August 2020
Deepak Narayanan
Keshav Santhanam
Fiodar Kazhamiaka
Amar Phanishayee
Matei A. Zaharia
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads"
22 / 22 papers shown
Title
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
Seungbeom Choi
Jeonghoe Goo
Eunjoo Jeon
Mingyu Yang
Minsung Jang
26
0
0
14 May 2025
Patchwork: A Unified Framework for RAG Serving
Bodun Hu
Luis Pabon
Saurabh Agarwal
Aditya Akella
31
0
0
01 May 2025
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
54
16
0
22 Apr 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
42
6
0
09 Apr 2024
A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters
Chunyu Xue
Weihao Cui
Han Zhao
Quan Chen
Shulai Zhang
Peng Yang
Jing Yang
Shaobo Li
Minyi Guo
56
2
0
24 Mar 2024
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
Yuting Yang
Andrea Merlina
Weijia Song
Tiancheng Yuan
Ken Birman
Roman Vitenberg
49
0
0
27 Feb 2024
Towards providing reliable job completion time predictions using PCS
Abdullah Bin Faisal
Noah Martin
Hafiz Mohsin Bashir
Swaminathan Lamelas
Fahad R. Dogar
27
0
0
18 Jan 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
45
12
0
11 Jan 2024
Scheduling Multi-Server Jobs with Sublinear Regrets via Online Learning
Hailiang Zhao
Shuiguang Deng
Zhengzhe Xiang
Xueqiang Yan
Yuxiang Cai
Schahram Dustdar
Albert Y. Zomaya
41
1
0
11 May 2023
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
28
27
0
19 Apr 2023
Energy-Efficient GPU Clusters Scheduling for Deep Learning
Diandian Gu
Xintong Xie
Gang Huang
Xin Jin
Xuanzhe Liu
GNN
29
7
0
13 Apr 2023
MuxFlow: Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters
Yihao Zhao
Xin Liu
Shufan Liu
Xiang Li
Yibo Zhu
Gang Huang
Xuanzhe Liu
Xin Jin
40
11
0
24 Mar 2023
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
30
11
0
12 Oct 2022
GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation
O. Sýkora
P. Phothilimthana
Charith Mendis
Amir Yazdanbakhsh
GNN
50
21
0
08 Oct 2022
EasyScale: Accuracy-consistent Elastic Training for Deep Learning
Mingzhen Li
Wencong Xiao
Biao Sun
Hanyu Zhao
Hailong Yang
...
Xianyan Jia
Yi Liu
Yong Li
Wei Lin
D. Qian
29
7
0
30 Aug 2022
Learning to Schedule Multi-Server Jobs with Fluctuated Processing Speeds
Hailiang Zhao
Shuiguang Deng
Feiyi Chen
Yuxiang Cai
Schahram Dustdar
Albert Y. Zomaya
49
5
0
09 Apr 2022
Pathways: Asynchronous Distributed Dataflow for ML
P. Barham
Aakanksha Chowdhery
J. Dean
Sanjay Ghemawat
Steven Hand
...
Parker Schuh
Ryan Sepassi
Laurent El Shafey
C. A. Thekkath
Yonghui Wu
GNN
MoE
47
126
0
23 Mar 2022
HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments
Ji Liu
Zhihua Wu
Dianhai Yu
Yanjun Ma
Danlei Feng
Minxu Zhang
Xinxuan Wu
Xuefeng Yao
Dejing Dou
23
44
0
20 Nov 2021
Enabling Level-4 Autonomous Driving on a Single
1
k
O
f
f
−
t
h
e
−
S
h
e
l
f
C
a
r
d
1k Off-the-Shelf Card
1
k
O
ff
−
t
h
e
−
S
h
e
l
f
C
a
r
d
Hsin-Hsuan Sung
Yuanchao Xu
Jiexiong Guan
Wei Niu
Shaoshan Liu
Bin Ren
Yanzhi Wang
Xipeng Shen
23
3
0
12 Oct 2021
Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Cheng Tan
Zhichao Li
Jian Zhang
Yunyin Cao
Sikai Qi
Zherui Liu
Yibo Zhu
Chuanxiong Guo
31
34
0
18 Sep 2021
A Runtime-Based Computational Performance Predictor for Deep Neural Network Training
Geoffrey X. Yu
Yubo Gao
P. Golikov
Gennady Pekhimenko
3DH
36
67
0
31 Jan 2021
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
Andrew Or
Haoyu Zhang
M. Freedman
22
9
0
20 Sep 2020
1