ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.06180
  4. Cited By
Efficient Memory Management for Large Language Model Serving with
  PagedAttention

Efficient Memory Management for Large Language Model Serving with PagedAttention

12 September 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
    VLM
ArXivPDFHTML

Papers citing "Efficient Memory Management for Large Language Model Serving with PagedAttention"

50 / 402 papers shown
Title
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILaw
ELM
31
0
0
19 May 2025
GuRE:Generative Query REwriter for Legal Passage Retrieval
GuRE:Generative Query REwriter for Legal Passage Retrieval
Daehee Kim
Deokhyung Kang
Jonghwi Kim
Sangwon Ryu
Gary Geunbae Lee
RALM
AILaw
22
0
0
19 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Feiyu Xiong
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
7
0
0
18 May 2025
Token-Level Uncertainty Estimation for Large Language Model Reasoning
Tunyu Zhang
Haizhou Shi
Yibin Wang
Hengyi Wang
Xiaoxiao He
...
Ligong Han
Kai Xu
Huatian Zhang
Dimitris N. Metaxas
Hao Wang
LRM
14
0
0
16 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
22
0
0
16 May 2025
Disentangling Reasoning and Knowledge in Medical Large Language Models
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELM
AI4MH
LM&MA
LRM
25
0
0
16 May 2025
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
Hao Mark Chen
Guanxi Lu
Yasuyuki Okoshi
Zhiwen Mo
Masato Motomura
Hongxiang Fan
LRM
7
0
0
16 May 2025
CAMEO: Collection of Multilingual Emotional Speech Corpora
CAMEO: Collection of Multilingual Emotional Speech Corpora
Iwona Christop
Maciej Czajka
19
0
0
16 May 2025
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Yinsicheng Jiang
Yao Fu
Yeqi Huang
Ping Nie
Zhan Lu
...
Dayou Du
Tairan Xu
Kai Zou
Edoardo Ponti
Luo Mai
MoE
22
0
0
16 May 2025
Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models
Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models
Camille Couturier
Spyros Mastorakis
Haiying Shen
Saravan Rajmohan
Victor Rühle
KELM
17
0
0
16 May 2025
Improve Rule Retrieval and Reasoning with Self-Induction and Relevance ReEstimate
Improve Rule Retrieval and Reasoning with Self-Induction and Relevance ReEstimate
Ziyang Huang
Wangtao Sun
Jun Zhao
Kang Liu
LRM
17
0
0
16 May 2025
The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)
The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)
Kirill Vasilevski
Benjamin Rombaut
Gopi Krishnan Rajbahadur
G. Oliva
Keheliya Gallaba
...
Bouyan Chen
Kishanthan Thangarajah
Ahmed E. Hassan
Zhen Ming
Jiang
22
0
0
15 May 2025
Analog Foundation Models
Analog Foundation Models
Julian Büchel
Iason Chalas
Giovanni Acampa
An Chen
Omobayode Fagbohungbe
Sidney Tsai
Kaoutar El Maghraoui
Manuel Le Gallo
Abbas Rahimi
Abu Sebastian
MQ
35
0
0
14 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
31
0
0
14 May 2025
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
Seungbeom Choi
Jeonghoe Goo
Eunjoo Jeon
Mingyu Yang
Minsung Jang
21
0
0
14 May 2025
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
X. Huang
Weiwen Liu
Xingshan Zeng
Y. Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Yishuo Wang
R. Tang
Defu Lian
KELM
36
0
0
12 May 2025
On the Robustness of Reward Models for Language Model Alignment
On the Robustness of Reward Models for Language Model Alignment
Jiwoo Hong
Noah Lee
Eunki Kim
Guijin Son
Woojin Chung
Aman Gupta
Shao Tang
James Thorne
29
0
0
12 May 2025
Multimodal Survival Modeling in the Age of Foundation Models
Multimodal Survival Modeling in the Age of Foundation Models
Steven Song
Morgan Borjigin-Wang
Irene Madejski
Robert L. Grossman
26
0
0
12 May 2025
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
Arun S. Maiya
KELM
31
0
0
12 May 2025
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Yong Li
Haojie Wang
Biao Hou
Jidong Zhai
43
0
0
12 May 2025
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang
Weijie Xu
Fanyou Wu
Chandan K. Reddy
31
0
0
12 May 2025
Learning from Peers in Reasoning Models
Learning from Peers in Reasoning Models
Tongxu Luo
Wenyu Du
Jiaxi Bi
Stephen Chung
Zhengyang Tang
Hao Yang
M. Zhang
Benyou Wang
LRM
41
0
0
12 May 2025
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
Shucheng Huang
Freda Shi
Chen Sun
Jiaming Zhong
Minghao Ning
Yufeng Yang
Yukun Lu
Hong Wang
A. Khajepour
33
0
0
11 May 2025
I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference
I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference
Zibo Gao
Junjie Hu
Feng Guo
Yixin Zhang
Yinglong Han
Siyuan Liu
Haiyang Li
Zhiqiang Lv
31
0
0
10 May 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
235
0
0
09 May 2025
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Zehao Fan
Garrett Gagnon
Zhenyu Liu
Liu Liu
29
0
0
09 May 2025
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Joshua Harris
Fan Grayson
Felix Feldman
Timothy Laurence
Toby Nonnenmacher
...
Leo Loman
Selina Patel
Thomas Finnie
Samuel Collins
Michael Borowitz
AI4MH
LM&MA
ELM
54
0
0
09 May 2025
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung
Jiachen Liu
Jeff J. Ma
Ruofan Wu
Oh Jun Kweon
Yuxuan Xia
Zhiyu Wu
Mosharaf Chowdhury
31
0
0
09 May 2025
CellVerse: Do Large Language Models Really Understand Cell Biology?
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang
Tianyu Liu
Zhihong Zhu
Yu Wang
Haoyu Wang
Donghao Zhou
Yefeng Zheng
Kun Wang
X. Wu
Pheng-Ann Heng
ELM
41
0
0
09 May 2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
Massimiliano Pronesti
Joao Bettencourt-Silva
Paul Flanagan
Alessandra Pascale
Oisin Redmond
Anya Belz
Yufang Hou
38
0
0
09 May 2025
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
68
0
0
06 May 2025
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
...
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
34
0
0
06 May 2025
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Yushen Chen
Jiawei Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
38
0
0
05 May 2025
Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Xueguang Ma
Luyu Gao
Shengyao Zhuang
Jiaqi Samantha Zhan
Jamie Callan
Jimmy Lin
199
0
0
05 May 2025
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
Arnab Sanyal
Prithwish Mukherjee
Gourav Datta
Sandeep P. Chinchali
MQ
198
0
0
05 May 2025
Adaptive Thinking via Mode Policy Optimization for Social Language Agents
Adaptive Thinking via Mode Policy Optimization for Social Language Agents
Minzheng Wang
You Li
Haozhao Wang
Xinghua Zhang
Nan Xu
Bingli Wu
Fei Huang
Haiyang Yu
Wenji Mao
LLMAG
LRM
43
1
0
04 May 2025
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Xingyu Zhou
Wei Long
Jingbo Lu
Shiyin Jiang
Weiyi You
Haifeng Wu
Shuhang Gu
48
0
0
04 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
60
2
0
04 May 2025
Accelerating Large Language Model Reasoning via Speculative Search
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
75
1
0
03 May 2025
PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding
PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding
Bradley McDanel
S. Zhang
Y. Hu
Zining Liu
MoE
187
0
0
02 May 2025
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Zihan Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
160
0
0
02 May 2025
Multi-agents based User Values Mining for Recommendation
Multi-agents based User Values Mining for Recommendation
L. Chen
Wei Yuan
Tong Chen
Xiangyu Zhao
Nguyen Quoc Viet Hung
Hongzhi Yin
OffRL
52
0
0
02 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
61
0
0
02 May 2025
Patchwork: A Unified Framework for RAG Serving
Patchwork: A Unified Framework for RAG Serving
Bodun Hu
Luis Pabon
Saurabh Agarwal
Aditya Akella
26
0
0
01 May 2025
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
Daria Gitman
Igor Gitman
Evelina Bakhturina
SyDa
49
0
0
01 May 2025
Scaling On-Device GPU Inference for Large Generative Models
Scaling On-Device GPU Inference for Large Generative Models
Jiuqiang Tang
Raman Sarokin
Ekaterina Ignasheva
Grant Jensen
Lin Chen
Juhyun Lee
Andrei Kulik
Matthias Grundmann
186
1
0
01 May 2025
GPU Performance Portability needs Autotuning
GPU Performance Portability needs Autotuning
Burkhard Ringlein
Thomas Parnell
Radu Stoica
185
0
0
30 Apr 2025
DNB-AI-Project at SemEval-2025 Task 5: An LLM-Ensemble Approach for Automated Subject Indexing
DNB-AI-Project at SemEval-2025 Task 5: An LLM-Ensemble Approach for Automated Subject Indexing
Lisa Kluge
Maximilian Kähler
180
1
0
30 Apr 2025
DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
Zhibo Man
Yuanmeng Chen
Yujie Zhang
Jinan Xu
62
0
0
29 Apr 2025
123456789
Next