Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17074
Cited By
Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency
20 May 2025
Ruixiao Li
Fahao Chen
Peng Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency"
8 / 8 papers shown
Title
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
Haoran Qiu
Weichao Mao
Archit Patke
Shengkun Cui
Saurabh Jha
Chen Wang
Hubertus Franke
Zbigniew T. Kalbarczyk
Tamer Basar
Ravishankar K. Iyer
54
26
0
12 Apr 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
64
35
0
15 Mar 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
114
296
0
19 Jan 2024
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
63
107
0
08 Aug 2023
Large Language Models
Michael R Douglas
LLMAG
LM&MA
129
625
0
11 Jul 2023
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
111
702
0
30 Nov 2022
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
193
1,948
0
16 Aug 2021
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
741
41,894
0
28 May 2020
1