Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.08784
Cited By
SLOs-Serve: Optimized Serving of Multi-SLO LLMs
5 April 2025
Siyuan Chen
Zhipeng Jia
S. Khan
Arvind Krishnamurthy
Phillip B. Gibbons
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SLOs-Serve: Optimized Serving of Multi-SLO LLMs"
3 / 3 papers shown
Title
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
...
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
81
1
0
06 May 2025
Patchwork: A Unified Framework for RAG Serving
Bodun Hu
Luis Pabon
Saurabh Agarwal
Aditya Akella
86
0
0
01 May 2025
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
164
29
0
07 May 2024
1