Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.03622
Cited By
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping
6 June 2023
Minchen Yu
Ao Wang
Dong-dong Chen
Haoxuan Yu
Xiaonan Luo
Zhuohao Li
Wei Wang
Ruichuan Chen
Dapeng Nie
Haoran Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping"
6 / 6 papers shown
Title
SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding
Kaiyu Huang
Yu Wang
Zhubo Shi
Han Zou
Minchen Yu
Qingjiang Shi
LRM
49
2
0
07 Mar 2025
ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models
Yao Fu
Leyang Xue
Yeqi Huang
Andrei-Octavian Brabete
Dmitrii Ustiugov
Yuvraj Patel
Luo Mai
25
6
0
25 Jan 2024
A Survey of Serverless Machine Learning Model Inference
Kamil Kojs
46
2
0
22 Nov 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
371
0
13 Mar 2023
FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute
Ao Wang
Shuai Chang
Huangshi Tian
Hongqi Wang
Haoran Yang
Huiba Li
Rui Du
Yue Cheng
46
104
0
24 May 2021
Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider
Mohammad Shahrad
Rodrigo Fonseca
Íñigo Goiri
G. Chaudhry
Paul Batum
Jason Cooke
Eduardo Laureano
Colby Tresness
M. Russinovich
Ricardo Bianchini
89
604
0
06 Mar 2020
1