Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.08509
Cited By
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
12 April 2024
Haoran Qiu
Weichao Mao
Archit Patke
Shengkun Cui
Saurabh Jha
Chen Wang
Hubertus Franke
Zbigniew T. Kalbarczyk
Tamer Basar
Ravishankar K. Iyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction"
6 / 6 papers shown
Title
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
Seungbeom Choi
Jeonghoe Goo
Eunjoo Jeon
Mingyu Yang
Minsung Jang
21
0
0
14 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Zhengyuan Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Zehao Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Wei Zhang
Zhiyu Wu
Yi Mu
Banruo Liu
Myungjin Lee
Fan Lai
58
0
0
24 Apr 2025
Efficient LLM Scheduling by Learning to Rank
Yichao Fu
Siqi Zhu
Runlong Su
Aurick Qiao
Ion Stoica
Hao Zhang
55
19
0
28 Aug 2024
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
Sania Nayab
Giulio Rossolini
Giorgio Buttazzo
Nicolamaria Manes
F. Giacomelli
Nicolamaria Manes
Fabrizio Giacomelli
LRM
59
24
0
29 Jul 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
130
141
0
03 Feb 2024
1