Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.12391
Cited By
LLM Inference Serving: Survey of Recent Advances and Opportunities
17 July 2024
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLM Inference Serving: Survey of Recent Advances and Opportunities"
11 / 11 papers shown
Title
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
Haolin Zhang
Jeff Huang
35
0
0
09 May 2025
Unveiling the Landscape of LLM Deployment in the Wild: An Empirical Study
Xinyi Hou
Jiahao Han
Yanjie Zhao
Haoyu Wang
41
0
0
05 May 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
62
0
0
06 Mar 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
46
1
0
28 Jan 2025
DeServe: Towards Affordable Offline LLM Inference via Decentralization
Linyu Wu
Xiaoyuan Liu
Tianneng Shi
Zhe Ye
D. Song
OffRL
47
0
0
28 Jan 2025
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Chaofan Lin
Zhenhua Han
Chengruidong Zhang
Yuqing Yang
Fan Yang
Chen Chen
Lili Qiu
87
38
0
30 May 2024
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention
Bin Gao
Zhuomin He
Puru Sharma
Qingxuan Kang
Djordje Jevdjic
Junbo Deng
Xingkun Yang
Zhou Yu
Pengfei Zuo
71
45
0
23 Mar 2024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
...
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
MoE
94
9
0
29 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
133
144
0
03 Feb 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
371
0
13 Mar 2023
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Haiyang Huang
Newsha Ardalani
Anna Y. Sun
Liu Ke
Hsien-Hsin S. Lee
Anjali Sridhar
Shruti Bhosale
Carole-Jean Wu
Benjamin C. Lee
MoE
70
23
0
10 Mar 2023
1