Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.09670
Cited By
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
18 January 2024
Yinmin Zhong
Shengyu Liu
Junda Chen
Jianbo Hu
Yibo Zhu
Xuanzhe Liu
Xin Jin
Hao Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving"
2 / 102 papers shown
Title
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
Previous
1
2
3