ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.06180
  4. Cited By
Efficient Memory Management for Large Language Model Serving with
  PagedAttention

Efficient Memory Management for Large Language Model Serving with PagedAttention

12 September 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
    VLM
ArXivPDFHTML

Papers citing "Efficient Memory Management for Large Language Model Serving with PagedAttention"

50 / 412 papers shown
Title
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages
Archchana Sindhujan
Diptesh Kanojia
Constantin Orasan
Shenbin Qian
38
2
0
08 Jan 2025
Information Extraction from Clinical Notes: Are We Ready to Switch to Large Language Models?
Information Extraction from Clinical Notes: Are We Ready to Switch to Large Language Models?
Yan Hu
X. Zuo
Yujia Zhou
Xueqing Peng
J. Huang
...
Ruey-Ling Weng
Qingyu Chen
Xiaoqian Jiang
Kirk Roberts
Hua Xu
LM&MA
41
4
0
08 Jan 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Yiyao Yu
Xinzhe Ni
Zicheng Lin
Jin Zeng
Yujiu Yang
LRM
83
14
0
08 Jan 2025
GeAR: Generation Augmented Retrieval
Haoyu Liu
Shaohan Huang
Jianfeng Liu
Yuefeng Zhan
H. Sun
Weiwei Deng
Feng Sun
Furu Wei
Qi Zhang
47
1
0
06 Jan 2025
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
...
Kaipeng Zhang
Chong Chen
Fan Yang
Yue Yang
Lili Qiu
60
30
0
03 Jan 2025
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Han Zhang
Xiaoman Pan
Hongwei Wang
Kaixin Ma
Wenhao Yu
Dong Yu
LLMAG
69
3
0
03 Jan 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Yineng Zhang
...
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
Luis Ceze
67
21
0
02 Jan 2025
In-Context Learning with Iterative Demonstration Selection
In-Context Learning with Iterative Demonstration Selection
Chengwei Qin
Aston Zhang
Chong Chen
Anirudh Dagar
Wenming Ye
LRM
73
40
0
31 Dec 2024
LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System
LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System
Hyucksung Kwon
Kyungmo Koo
Janghyeon Kim
W. Lee
Minjae Lee
...
Yongkee Kwon
Ilkon Kim
Euicheol Lim
John Kim
Jungwook Choi
74
4
0
28 Dec 2024
MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge
MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge
Jie He
Nan Hu
Wanqiu Long
Jiaoyan Chen
Jeff Z. Pan
ELM
LRM
102
6
0
22 Dec 2024
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval
Luo Ji
Feixiang Guo
Teng Chen
Qingqing Gu
Xiaoyu Wang
...
Peng Yu
Yue Zhao
Hongyang Lei
Zhonglin Jiang
Yong Chen
RALM
LRM
102
0
0
21 Dec 2024
Parallelized Autoregressive Visual Generation
Parallelized Autoregressive Visual Generation
Yunhong Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
95
12
0
19 Dec 2024
ProcessBench: Identifying Process Errors in Mathematical Reasoning
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Chujie Zheng
Zizhuo Zhang
Beichen Zhang
Runji Lin
Keming Lu
Bowen Yu
Dayiheng Liu
Jingren Zhou
Junyang Lin
LRM
134
52
0
09 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
151
5
0
04 Dec 2024
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Junjie Wen
Minjie Zhu
Bo Li
Zhibin Tang
Jinming Li
...
Chengmeng Li
Xiaoyu Liu
Chaomin Shen
Yaxin Peng
Feifei Feng
98
16
0
04 Dec 2024
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
76
0
0
02 Dec 2024
On Domain-Specific Post-Training for Multimodal Large Language Models
On Domain-Specific Post-Training for Multimodal Large Language Models
Daixuan Cheng
Shaohan Huang
Ziyu Zhu
Xintong Zhang
Wayne Xin Zhao
Zhongzhi Luan
Bo Dai
Zhenliang Zhang
VLM
102
2
0
29 Nov 2024
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich
Tomer Ronen
Talor Abramovich
Nir Ailon
Nave Assaf
...
Ido Shahaf
Oren Tropp
Omer Ullman Argov
Ran Zilberstein
Ran El-Yaniv
91
1
0
28 Nov 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
97
4
0
28 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
77
0
0
20 Nov 2024
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri
Bartłomiej Cupiał
Samuel Coward
Ulyana Piterbarg
Maciej Wolczyk
...
Lerrel Pinto
Rob Fergus
Jakob Foerster
Jack Parker-Holder
Tim Rocktaschel
LLMAG
LRM
119
12
0
20 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
78
0
0
12 Nov 2024
GitChameleon: Unmasking the Version-Switching Capabilities of Code
  Generation Models
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
Nizar Islah
Justine Gehring
Diganta Misra
Eilif B. Muller
Irina Rish
Terry Yue Zhuo
Massimo Caccia
SyDa
50
1
0
05 Nov 2024
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
Wei Wu
Zhuoshi Pan
Chao Wang
L. Chen
Y. Bai
Kun Fu
Zehua Wang
Hui Xiong
Hui Xiong
LLMAG
44
5
0
05 Nov 2024
MdEval: Massively Multilingual Code Debugging
MdEval: Massively Multilingual Code Debugging
Shukai Liu
Linzheng Chai
Jian Yang
Jiajun Shi
He Zhu
...
Yu Hao
Liqun Yang
Guanglin Niu
Ge Zhang
Zheng Li
LRM
ELM
78
6
0
04 Nov 2024
Context Parallelism for Scalable Million-Token Inference
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoE
LRM
72
5
0
04 Nov 2024
SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Ziming Mao
Tian Xia
Zhanghao Wu
Wei-Lin Chiang
Tyler Griggs
Romil Bhardwaj
Zongheng Yang
S. Shenker
Ion Stoica
62
2
0
03 Nov 2024
Do Large Language Models Align with Core Mental Health Counseling Competencies?
Do Large Language Models Align with Core Mental Health Counseling Competencies?
Viet Cuong Nguyen
Mohammad Taher
Dongwan Hong
Vinicius Konkolics Possobom
Vibha Thirunellayi Gopalakrishnan
...
Zihang Li
H. J. Soled
Michael L. Birnbaum
Srijan Kumar
M. D. Choudhury
ELM
LM&MA
AI4MH
39
3
0
29 Oct 2024
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
68
4
0
29 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
84
5
0
28 Oct 2024
Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation
Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation
Mufei Li
Siqi Miao
Pan Li
RALM
38
9
0
28 Oct 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Yiyuan Ma
Wenlei Bao
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
88
16
0
28 Oct 2024
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu
Zhiwei He
Xiaofeng Wang
Pengfei Liu
Rui Wang
OSLM
64
3
0
24 Oct 2024
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
Ferdi Kossmann
Bruce Fontaine
Daya Khudia
Michael Cafarella
Samuel Madden
143
2
0
23 Oct 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Rameswar Panda
OffRL
85
6
0
23 Oct 2024
ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage
ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage
Taewhoo Lee
Chanwoong Yoon
Kyochul Jang
Donghyeon Lee
Minju Song
Hyunjae Kim
Jaewoo Kang
ELM
35
1
0
22 Oct 2024
PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles
PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles
Li Siyan
Vethavikashini Chithrra Raghuram
Omar Khattab
Julia Hirschberg
Zhou Yu
34
9
0
22 Oct 2024
ToW: Thoughts of Words Improve Reasoning in Large Language Models
ToW: Thoughts of Words Improve Reasoning in Large Language Models
Zhikun Xu
Ming shen
Jacob Dineen
Zhaonan Li
Xiao Ye
Shijie Lu
Aswin Rrv
Chitta Baral
Ben Zhou
LRM
221
1
0
21 Oct 2024
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
Shaonan Wu
Shuai Lu
Y. Gong
Nan Duan
Ping Wei
AIMat
48
0
0
21 Oct 2024
EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models
EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models
Junhao Hu
Wenrui Huang
Haoran Wang
Weidong Wang
Tiancheng Hu
Qin Zhang
Hao Feng
Xusheng Chen
Yizhou Shan
Tao Xie
RALM
LLMAG
42
4
0
20 Oct 2024
Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event
  Representation
Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation
Safeyah Khaled Alshemali
Daniel Bauer
Yuval Marton
BDL
215
0
0
19 Oct 2024
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Seulbi Lee
J. Kim
Sangheum Hwang
LRM
205
0
0
19 Oct 2024
Montessori-Instruct: Generate Influential Training Data Tailored for
  Student Learning
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Xiaochuan Li
Zichun Yu
Chenyan Xiong
SyDa
41
1
0
18 Oct 2024
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings
Raghuveer Thirukovalluru
Bhuwan Dhingra
37
2
0
18 Oct 2024
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu
Haoyi Wu
Kewei Tu
34
3
0
18 Oct 2024
Boosting LLM Translation Skills without General Ability Loss via
  Rationale Distillation
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation
Junhong Wu
Yang Zhao
Yangyifan Xu
Bing Liu
Chengqing Zong
CLL
47
1
0
17 Oct 2024
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
Nandan Thakur
Suleman Kazi
Ge Luo
Jimmy J. Lin
Amin Ahmad
VLM
RALM
30
7
0
17 Oct 2024
Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?
Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?
Qisheng Hu
Quanyu Long
Wenya Wang
200
5
0
17 Oct 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Mark Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
39
1
0
17 Oct 2024
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae
Namyoung Kim
Kai Tzu-iunn Ong
Minju Gwak
Gwanwoo Song
Jihoon Kim
S. Kim
Dongha Lee
Jinyoung Yeo
LLMAG
33
15
0
17 Oct 2024
Previous
123456789
Next