ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.19073
36
0
v1v2 (latest)

CARMA: Collocation-Aware Resource Manager

26 August 2025
Ehsan Yousefzadeh-Asl-Miandoab
Reza Karimzadeh
Bulat Ibragimov
Florina M. Ciorba
Pınar Tözün
ArXiv (abs)PDFHTMLGithub (1★)
Main:11 Pages
14 Figures
Bibliography:4 Pages
4 Tables
Appendix:3 Pages
Abstract

GPUs running deep learning (DL) workloads are frequently underutilized. Collocating multiple DL training tasks on the same GPU can improve utilization but introduces two key risks: (1) out-of-memory (OOM) crashes for newly scheduled tasks, and (2) severe performance interference among co-running tasks, which can negate any throughput gains. These issues reduce system robustness, quality of service, and energy efficiency. We present CARMA, a task-level, collocation-aware resource management system for the server-scale. CARMA addresses collocation challenges via (1) fine-grained monitoring and bookkeeping of GPUs and a collocation risk analysis that filters out the high-risk GPUs; (2) task placement policies that cap GPU utilization to avoid OOMs and limit interference; (3) integration of GPU memory need estimators for DL tasks to minimize OOMs during collocation; and (4) a lightweight recovery method that relaunches jobs crashed due to OOMs. Our evaluation on a DL training workload derived from real-world traces shows that CARMA uses GPUs more efficiently by making more informed collocation decisions: for the best-performing collocation policy, CARMA increases GPU streaming multiprocessor (SM) utilization by 54%, the parallelism achieved per SM by 61%, and memory use by 62%. This results in a ∼\sim∼35% and ∼\sim∼15% reduction in the end-to-end execution time (makespan) and GPU energy consumption, respectively, for this workload.

View on arXiv
Comments on this paper