ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.15792
  4. Cited By
Efficient LLM Scheduling by Learning to Rank

Efficient LLM Scheduling by Learning to Rank

28 August 2024
Yichao Fu
Siqi Zhu
Runlong Su
Aurick Qiao
Ion Stoica
Hao Zhang
ArXivPDFHTML

Papers citing "Efficient LLM Scheduling by Learning to Rank"

14 / 14 papers shown
Title
Taming the Titans: A Survey of Efficient LLM Inference Serving
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Zheng Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Zehao Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Wei Zhang
Zhiyu Wu
Yi Mu
Banruo Liu
Myungjin Lee
Fan Lai
58
0
0
24 Apr 2025
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Yinmin Zhong
Zili Zhang
Xiaoniu Song
Hanpeng Hu
Chao Jin
...
Changyi Wan
Hongyu Zhou
Yimin Jiang
Yibo Zhu
Daxin Jiang
OffRL
AI4TS
57
0
0
22 Apr 2025
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints
Ruicheng Ao
Gan Luo
D. Simchi-Levi
Xinshang Wang
31
2
0
15 Apr 2025
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
Shihong Gao
Xuzhi Zhang
Yanyan Shen
Lei Chen
24
1
0
10 Apr 2025
Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation
Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation
Jingzhi Fang
Yanyan Shen
Y. Wang
Lei Chen
44
2
0
21 Mar 2025
Queueing, Predictions, and LLMs: Challenges and Open Problems
Michael Mitzenmacher
Rana Shahout
AI4TS
LRM
41
1
0
10 Mar 2025
Efficiently Serving LLM Reasoning Programs with Certaindex
Efficiently Serving LLM Reasoning Programs with Certaindex
Yichao Fu
Junda Chen
Siqi Zhu
Zheyu Fu
Zhongdongming Dai
Aurick Qiao
Hao Zhang
LRM
57
13
0
31 Dec 2024
Multi-Bin Batching for Increasing LLM Inference Throughput
Multi-Bin Batching for Increasing LLM Inference Throughput
Ozgur Guldogan
Jackson Kunde
Kangwook Lee
Ramtin Pedarsani
LRM
70
2
0
03 Dec 2024
BlendServe: Optimizing Offline Inference for Auto-regressive Large
  Models with Resource-aware Batching
BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Yilong Zhao
Shuo Yang
Kan Zhu
Lianmin Zheng
Baris Kasikci
Yang Zhou
Jiarong Xing
Ion Stoica
120
5
0
25 Nov 2024
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM
  Inference Environments
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments
Nikoleta Iliakopoulou
Jovan Stojkovic
Chloe Alverti
Tianyin Xu
Hubertus Franke
Josep Torrellas
72
2
0
24 Nov 2024
Revisiting SLO and Goodput Metrics in LLM Serving
Revisiting SLO and Goodput Metrics in LLM Serving
Zhibin Wang
Shipeng Li
Yuhang Zhou
Xue Li
Rong Gu
Nguyen Cam-Tu
Chen Tian
Sheng Zhong
23
6
0
18 Oct 2024
Don't Stop Me Now: Embedding Based Scheduling for LLMs
Don't Stop Me Now: Embedding Based Scheduling for LLMs
Rana Shahout
Eran Malach
Chunwei Liu
Weifan Jiang
Minlan Yu
Michael Mitzenmacher
AI4TS
35
5
0
01 Oct 2024
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1