Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

21 April 2023

Papers citing "Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems"

4 / 4 papers shown

Title
EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at Edge Motahare Mounesan Xiaojie Zhang S. Debroy 21 1 0 16 Oct 2024
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling Kamran Razavi Saeid Ghafouri Max Mühlhäuser Pooyan Jamshidi Lin Wang 29 3 0 31 Mar 2024
Resource Allocation of Industry 4.0 Micro-Service Applications across Serverless Fog Federation R. Hussain Mohsen Amini Salehi 25 12 0 14 Jan 2024
IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency Saeid Ghafouri Kamran Razavi Mehran Salmani Alireza Sanaee T. Lorido-Botran Lin Wang Joseph Doyle Pooyan Jamshidi 35 2 0 24 Aug 2023