Deep Neural Networks (DNNs) are useful in many applications, including transportation, healthcare, and speech recognition. Despite various efforts to improve accuracy, few works have studied DNN in the context of real-time requirements. Coarse resource allocation and sequential execution in existing frameworks result in underutilization. In this work, we conduct GPU speedup gain analysis and propose SGPRS, the first real-time GPU scheduler considering zero configuration partition switch. The proposed scheduler not only meets more deadlines for parallel tasks but also sustains overall performance beyond the pivot point.
View on arXiv