Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.17565
Cited By
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
25 June 2024
Cunchen Hu
Heyang Huang
Junhao Hu
Jiang Xu
Xusheng Chen
Tao Xie
Chenxi Wang
Sa Wang
Yungang Bao
Ninghui Sun
Yizhou Shan
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool"
7 / 7 papers shown
Title
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
160
1
0
03 Apr 2025
Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
51
0
0
16 Feb 2025
EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models
Junhao Hu
Wenrui Huang
Haoran Wang
Weidong Wang
Tiancheng Hu
Qin Zhang
Hao Feng
Xusheng Chen
Yizhou Shan
Tao Xie
RALM
LLMAG
36
4
0
20 Oct 2024
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Vikranth Srivatsa
Zijian He
Reyna Abhyankar
Dongming Li
Yiying Zhang
52
18
0
08 May 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Yujia Zhai
Chengquan Jiang
Leyuan Wang
Xiaoying Jia
Shang Zhang
Zizhong Chen
Xin Liu
Yibo Zhu
62
48
0
06 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
252
2,494
0
06 Oct 2022
1