Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction

12 April 2024

Papers citing "Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction"

6 / 6 papers shown

Title
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor Seungbeom Choi Jeonghoe Goo Eunjoo Jeon Mingyu Yang Minsung Jang 21 0 0 14 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving Ranran Zhen J. Li Yixin Ji Zhengyuan Yang Tong Liu Qingrong Xia Xinyu Duan Zehao Wang Baoxing Huai M. Zhang LLMAG 77 0 0 28 Apr 2025
Tempo: Application-aware LLM Serving with Mixed SLO Requirements Wei Zhang Zhiyu Wu Yi Mu Banruo Liu Myungjin Lee Fan Lai 58 0 0 24 Apr 2025
Efficient LLM Scheduling by Learning to Rank Yichao Fu Siqi Zhu Runlong Su Aurick Qiao Ion Stoica Hao Zhang 55 19 0 28 Aug 2024
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost Sania Nayab Giulio Rossolini Giorgio Buttazzo Nicolamaria Manes F. Giacomelli Nicolamaria Manes Fabrizio Giacomelli LRM 59 24 0 29 Jul 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 130 141 0 03 Feb 2024