Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.13510
Cited By
Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling
24 August 2024
Kunal Jain
Anjaly Parayil
Ankur Mallick
Esha Choukse
Xiaoting Qin
Jue Zhang
Íñigo Goiri
Rujia Wang
Chetan Bansal
Victor Rühle
Anoop Kulkarni
Steve Kofsky
Saravan Rajmohan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling"
12 / 12 papers shown
Title
GenTorrent: Scaling Large Language Model Serving with An Overley Network
Fei Fang
Yifan Hua
Shengze Wang
Ruilin Zhou
Y. Liu
Chen Qian
Wei Wei
89
0
0
27 Apr 2025
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
107
27
0
07 May 2024
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
Haoran Qiu
Weichao Mao
Archit Patke
Shengkun Cui
Saurabh Jha
Chen Wang
Hubertus Franke
Zbigniew T. Kalbarczyk
Tamer Basar
Ravishankar K. Iyer
49
26
0
12 Apr 2024
Learned Best-Effort LLM Serving
Siddharth Jha
Coleman Hooper
Xiaoxuan Liu
Sehoon Kim
Kurt Keutzer
34
2
0
15 Jan 2024
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
63
107
0
08 Aug 2023
H
2
_2
2
O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
124
289
0
24 Jun 2023
S
3
^{3}
3
: Increasing GPU Utilization during Generative Inference for Higher Throughput
Yunho Jin
Chun-Feng Wu
David Brooks
Gu-Yeon Wei
70
67
0
09 Jun 2023
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
205
5,513
0
07 Jul 2021
Heuristic-Guided Reinforcement Learning
Ching-An Cheng
Andrey Kolobov
Adith Swaminathan
OffRL
59
62
0
05 Jun 2021
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
91
935
0
27 Jan 2020
ELI5: Long Form Question Answering
Angela Fan
Yacine Jernite
Ethan Perez
David Grangier
Jason Weston
Michael Auli
AI4MH
ELM
82
617
0
22 Jul 2019
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
252
8,124
0
16 Jun 2016
1