v1v2 (latest)

CARMA: Collocation-Aware Resource Manager

26 August 2025

Ehsan Yousefzadeh-Asl-Miandoab

ArXiv (abs)PDF HTML Github (1★)

Main:11 Pages

14 Figures

Bibliography:4 Pages

4 Tables

Appendix:3 Pages

Abstract

GPUs running deep learning (DL) workloads are frequently underutilized. Collocating multiple DL training tasks on the same GPU can improve utilization but introduces two key risks: (1) out-of-memory (OOM) crashes for newly scheduled tasks, and (2) severe performance interference among co-running tasks, which can negate any throughput gains. These issues reduce system robustness, quality of service, and energy efficiency. We present CARMA, a task-level, collocation-aware resource management system for the server-scale. CARMA addresses collocation challenges via (1) fine-grained monitoring and bookkeeping of GPUs and a collocation risk analysis that filters out the high-risk GPUs; (2) task placement policies that cap GPU utilization to avoid OOMs and limit interference; (3) integration of GPU memory need estimators for DL tasks to minimize OOMs during collocation; and (4) a lightweight recovery method that relaunches jobs crashed due to OOMs. Our evaluation on a DL training workload derived from real-world traces shows that CARMA uses GPUs more efficiently by making more informed collocation decisions: for the best-performing collocation policy, CARMA increases GPU streaming multiprocessor (SM) utilization by 54%, the parallelism achieved per SM by 61%, and memory use by 62%. This results in a $\sim$ 35% and $\sim$ 15% reduction in the end-to-end execution time (makespan) and GPU energy consumption, respectively, for this workload.

View on arXiv

Comments on this paper