Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.00041
Cited By
Dynamic Space-Time Scheduling for GPU Inference
31 December 2018
Paras Jain
Xiangxi Mo
Ajay Jain
Harikaran Subbaraj
Rehana Durrani
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dynamic Space-Time Scheduling for GPU Inference"
19 / 19 papers shown
Title
CascadeServe: Unlocking Model Cascades for Inference Serving
Ferdi Kossmann
Ziniu Wu
Alex Turk
Nesime Tatbul
Lei Cao
Samuel Madden
37
2
0
20 Jun 2024
Hydro: Adaptive Query Processing of ML Queries
Gaurav Tarlok Kakkar
Jiashen Cao
Aubhro Sengupta
Joy Arulraj
Hyesoon Kim
44
1
0
22 Mar 2024
A Survey of Serverless Machine Learning Model Inference
Kamil Kojs
43
2
0
22 Nov 2023
Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?
Seyed Morteza Nabavinejad
M. Ebrahimi
Sherief Reda
27
1
0
26 Aug 2023
Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU
Zhihe Zhao
Neiwen Ling
Nan Guan
Guoliang Xing
34
11
0
10 Jul 2023
D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs
Aditya Dhakal
Sameer G. Kulkarni
K. Ramakrishnan
20
3
0
31 Mar 2023
A Study on the Intersection of GPU Utilization and CNN Inference
J. Kosaian
Amar Phanishayee
23
3
0
15 Dec 2022
iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud
Fei Xu
Jianian Xu
Jiabin Chen
Li Chen
Ruitao Shang
Zhi Zhou
Fengyuan Liu
GNN
32
35
0
03 Nov 2022
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision
Wei Gao
Qi Hu
Zhisheng Ye
Peng Sun
Xiaolin Wang
Yingwei Luo
Tianwei Zhang
Yonggang Wen
86
26
0
24 May 2022
Batched matrix operations on distributed GPUs with application in theoretical physics
Nenad Mijić
Davor Davidović
11
2
0
17 Mar 2022
Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads
Guin Gilman
R. Walls
GNN
BDL
36
17
0
01 Oct 2021
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
S. Choi
Sunho Lee
Yeonjae Kim
Jongse Park
Youngjin Kwon
Jaehyuk Huh
30
21
0
01 Sep 2021
Boggart: Towards General-Purpose Acceleration of Retrospective Video Analytics
Neil Agarwal
Ravi Netravali
27
14
0
21 Jun 2021
Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time Workloads
Houssam-Eddine Zahaf
Ignacio Sañudo Olmedo
Jayati Singh
Nicola Capodieci
Sébastien Faucou
18
10
0
21 May 2021
Accelerating Multi-Model Inference by Merging DNNs of Different Weights
Joo Seong Jeong
Soojeong Kim
Gyeong-In Yu
Yunseong Lee
Byung-Gon Chun
FedML
MoMe
AI4CE
13
7
0
28 Sep 2020
Spatial Sharing of GPU for Autotuning DNN models
Aditya Dhakal
Junguk Cho
Sameer G. Kulkarni
K. Ramakrishnan
P. Sharma
19
8
0
08 Aug 2020
Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models
Matthew LeMay
Shijian Li
Tian Guo
14
25
0
05 Dec 2019
INFaaS: A Model-less and Managed Inference Serving System
Francisco Romero
Qian Li
N. Yadwadkar
Christos Kozyrakis
31
14
0
30 May 2019
The OoO VLIW JIT Compiler for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
33
17
0
28 Jan 2019
1