Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.06180
Cited By
Efficient Memory Management for Large Language Model Serving with PagedAttention
12 September 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Memory Management for Large Language Model Serving with PagedAttention"
12 / 412 papers shown
Title
Punica: Multi-Tenant LoRA Serving
Lequn Chen
Zihao Ye
Yongji Wu
Danyang Zhuo
Luis Ceze
Arvind Krishnamurthy
44
34
0
28 Oct 2023
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
Zefan Wang
Zichuan Liu
Yingying Zhang
Aoxiao Zhong
Lunting Fan
Lingfei Wu
Qingsong Wen
46
24
0
25 Oct 2023
AutoMix: Automatically Mixing Language Models
Pranjal Aggarwal
Aman Madaan
Ankit Anand
Srividya Pranavi Potharaju
Swaroop Mishra
...
Karthik Kappaganthu
Yiming Yang
Shyam Upadhyay
Manaal Faruqui
Mausam
42
19
0
19 Oct 2023
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Wenqi Jiang
Marco Zeller
R. Waleffe
Torsten Hoefler
Gustavo Alonso
54
14
0
15 Oct 2023
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
Hung Le
Hailin Chen
Amrita Saha
Akash Gokul
Doyen Sahoo
Chenyu You
LRM
28
42
0
13 Oct 2023
Online Speculative Decoding
Xiaoxuan Liu
Lanxiang Hu
Peter Bailis
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
29
53
0
11 Oct 2023
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Chia-Yuan Chang
Xia Hu
35
11
0
01 Oct 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
372
0
13 Mar 2023
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,872
0
18 Apr 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
417
0
18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhehuai Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,750
0
26 Sep 2016
Previous
1
2
3
4
5
6
7
8
9