Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.06139
Cited By
TensorFlow-Serving: Flexible, High-Performance ML Serving
17 December 2017
Christopher Olston
Noah Fiedel
Kiril Gorovoy
Jeremiah Harmsen
Li Lao
Fangwei Li
Vinu Rajashekhar
Sukriti Ramesh
Jordan Soyke
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TensorFlow-Serving: Flexible, High-Performance ML Serving"
33 / 33 papers shown
Title
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
Seungbeom Choi
Jeonghoe Goo
Eunjoo Jeon
Mingyu Yang
Minsung Jang
23
0
0
14 May 2025
LithOS: An Operating System for Efficient Machine Learning on GPUs
Patrick H. Coppock
Brian Zhang
Eliot H. Solomon
Vasilis Kypriotis
Leon Yang
Bikash Sharma
Dan Schatzberg
Todd C. Mowry
Dimitrios Skarlatos
42
0
0
21 Apr 2025
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling
Sohaib Ahmad
Hui Guan
Ramesh K. Sitaraman
42
4
0
04 Jul 2024
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Yujeong Choi
Jiin Kim
Minsoo Rhu
41
1
0
11 Jun 2024
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
Mohanad Odema
Luke Chen
Hyoukjun Kwon
Mohammad Abdullah Al Faruque
44
4
0
01 May 2024
Combining Cloud and Mobile Computing for Machine Learning
Ruiqi Xu
Tianchi Zhang
44
1
0
20 Jan 2024
Model Share AI: An Integrated Toolkit for Collaborative Machine Learning Model Development, Provenance Tracking, and Deployment in Python
Heinrich Peters
Michael Parrott
24
0
0
27 Sep 2023
Naeural AI OS -- Decentralized ubiquitous computing MLOps execution engine
Beatrice Milik
S. Saraev
Cristian Bleotiu
Radu Lupaescu
19
0
0
14 Jun 2023
S
3
^{3}
3
: Increasing GPU Utilization during Generative Inference for Higher Throughput
Yunho Jin
Chun-Feng Wu
David Brooks
Gu-Yeon Wei
39
62
0
09 Jun 2023
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Yujeong Choi
John Kim
Minsoo Rhu
21
1
0
23 Feb 2023
Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning
Gabriele Castellano
J. Nieto
Jordi Luque
Ferran Diego
Carlos Segura
Diego Perino
Flavio Esposito
Fulvio Risso
Aravindh Raman
19
0
0
31 Jan 2023
Improving the Performance of DNN-based Software Services using Automated Layer Caching
M. Abedi
Yanni Iouannou
Pooyan Jamshidi
Hadi Hemmati
28
0
0
18 Sep 2022
An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks
Pierrick Pochelu
S. Petiton
B. Conche
14
2
0
30 Aug 2022
Machine Learning with DBOS
R. Redmond
Nathan Weckwerth
Brian Xia
Qian Li
Peter Kraft
Deeptaanshu Kumar
cCaugatay Demiralp
Michael Stonebraker
OOD
28
0
0
10 Aug 2022
Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures
Yongji Wu
Matthew Lentz
Danyang Zhuo
Yao Lu
34
22
0
10 May 2022
CapillaryX: A Software Design Pattern for Analyzing Medical Images in Real-time using Deep Learning
Maged Abdalla Helmy Abdou
Paulo Ferreira
E. Jul
T. Truong
29
1
0
13 Apr 2022
GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge
Arthi Padmanabhan
Neil Agarwal
Anand Iyer
Ganesh Ananthanarayanan
Yuanchao Shu
Nikolaos Karianakis
G. Xu
Ravi Netravali
43
59
0
19 Jan 2022
Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Cheng Tan
Zhichao Li
Jian Zhang
Yunyin Cao
Sikai Qi
Zherui Liu
Yibo Zhu
Chuanxiong Guo
31
34
0
18 Sep 2021
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
S. Choi
Sunho Lee
Yeonjae Kim
Jongse Park
Youngjin Kwon
Jaehyuk Huh
30
21
0
01 Sep 2021
JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web-Scale Online Inference at Baidu
Hao Liu
Qian Gao
Jiang Li
X. Liao
Hao Xiong
...
Guobao Yang
Zhiwei Zha
Daxiang Dong
Dejing Dou
Haoyi Xiong
VLM
30
22
0
03 Jun 2021
Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks
Dequan Wang
An Ju
Evan Shelhamer
David Wagner
Trevor Darrell
AAML
26
27
0
18 May 2021
DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge
Zhe Yang
Klara Nahrstedt
Hongpeng Guo
Qian Zhou
19
21
0
05 May 2021
Accelerating Deep Learning Inference via Learned Caches
Arjun Balasubramanian
Adarsh Kumar
Yuhan Liu
Han Cao
Shivaram Venkataraman
Aditya Akella
28
18
0
18 Jan 2021
PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployment
Meghana Madhyastha
Kunal Lillaney
J. Browne
Joshua T. Vogelstein
Randal C. Burns
30
1
0
10 Nov 2020
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
Michael Lui
Yavuz Yetim
Özgür Özkan
Zhuoran Zhao
Shin-Yeh Tsai
Carole-Jean Wu
Mark Hempstead
GNN
BDL
LRM
22
51
0
04 Nov 2020
HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units
linda Qiao
Yanbo Xu
Alind Khare
Satria Priambada
K. Maher
Alaa Aljiffry
Jimeng Sun
Alexey Tumanov
OOD
31
84
0
10 Aug 2020
Real-Time Video Inference on Edge Devices via Adaptive Model Streaming
Mehrdad Khani Shirkoohi
Pouya Hamadanian
Arash Nasr-Esfahany
Mohammad Alizadeh
28
44
0
11 Jun 2020
AIBench Scenario: Scenario-distilling AI Benchmarking
Wanling Gao
Fei Tang
Jianfeng Zhan
Xu Wen
Lei Wang
Zheng Cao
Chuanxin Lan
Chunjie Luo
Xiaoli Liu
Zihan Jiang
29
14
0
06 May 2020
Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems
Siyuan Zhuang
Zhuohan Li
Danyang Zhuo
Stephanie Wang
Eric Liang
Robert Nishihara
Philipp Moritz
Ion Stoica
27
23
0
13 Feb 2020
The Design and Implementation of a Scalable DL Benchmarking Platform
Cheng-rong Li
Abdul Dakkak
Jinjun Xiong
Wen-mei W. Hwu
ALM
ELM
21
4
0
19 Nov 2019
Dynamic Space-Time Scheduling for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Harikaran Subbaraj
Rehana Durrani
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
35
64
0
31 Dec 2018
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments
Abdul Dakkak
Cheng-rong Li
Simon Garcia De Gonzalo
Jinjun Xiong
Wen-mei W. Hwu
21
19
0
24 Nov 2018
IDK Cascades: Fast Deep Learning by Learning not to Overthink
Xin Wang
Yujia Luo
D. Crankshaw
Alexey Tumanov
Fisher Yu
Joseph E. Gonzalez
35
107
0
03 Jun 2017
1