ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.06180
  4. Cited By
Efficient Memory Management for Large Language Model Serving with
  PagedAttention

Efficient Memory Management for Large Language Model Serving with PagedAttention

12 September 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
    VLM
ArXivPDFHTML

Papers citing "Efficient Memory Management for Large Language Model Serving with PagedAttention"

50 / 412 papers shown
Title
DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning
DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning
Gaurav Srivastava
Zhenyu Bi
Meng Lu
Xuan Wang
LLMAG
LRM
7
0
0
21 May 2025
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Weixiang Zhao
Xingyu Sui
Yulin Hu
Jiahe Guo
Haixiao Liu
Biye Li
Yanyan Zhao
Bing Qin
Ting Liu
OffRL
14
0
0
21 May 2025
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners
Weixiang Zhao
Jiahe Guo
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
Yanyan Zhao
Wanxiang Che
Bing Qin
Tat-Seng Chua
Ting Liu
LRM
12
0
0
21 May 2025
Large Language Models for Data Synthesis
Large Language Models for Data Synthesis
Yihong Tang
Menglin Kong
Lijun Sun
SyDa
24
0
0
20 May 2025
The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute
The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute
Yunho Jin
Gu-Yeon Wei
David Brooks
LRM
7
0
0
20 May 2025
GuRE:Generative Query REwriter for Legal Passage Retrieval
GuRE:Generative Query REwriter for Legal Passage Retrieval
Daehee Kim
Deokhyung Kang
Jonghwi Kim
Sangwon Ryu
Gary Geunbae Lee
RALM
AILaw
29
0
0
19 May 2025
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILaw
ELM
33
0
0
19 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Feiyu Xiong
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
7
0
0
18 May 2025
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
Hao Mark Chen
Guanxi Lu
Yasuyuki Okoshi
Zhiwen Mo
Masato Motomura
Hongxiang Fan
LRM
7
0
0
16 May 2025
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Yexiang Liu
Zekun Li
Zhi Fang
Nan Xu
Ran He
Tieniu Tan
LRM
17
0
0
16 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
22
0
0
16 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
Disentangling Reasoning and Knowledge in Medical Large Language Models
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELM
AI4MH
LM&MA
LRM
25
0
0
16 May 2025
Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models
Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models
Camille Couturier
Spyros Mastorakis
Haiying Shen
Saravan Rajmohan
Victor Rühle
KELM
17
0
0
16 May 2025
Token-Level Uncertainty Estimation for Large Language Model Reasoning
Tunyu Zhang
Haizhou Shi
Yibin Wang
Hengyi Wang
Xiaoxiao He
...
Ligong Han
Kai Xu
Huatian Zhang
Dimitris N. Metaxas
Hao Wang
LRM
19
0
0
16 May 2025
CAMEO: Collection of Multilingual Emotional Speech Corpora
CAMEO: Collection of Multilingual Emotional Speech Corpora
Iwona Christop
Maciej Czajka
21
0
0
16 May 2025
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Yinsicheng Jiang
Yao Fu
Yeqi Huang
Ping Nie
Zhan Lu
...
Dayou Du
Tairan Xu
Kai Zou
Edoardo Ponti
Luo Mai
MoE
22
0
0
16 May 2025
MatTools: Benchmarking Large Language Models for Materials Science Tools
MatTools: Benchmarking Large Language Models for Materials Science Tools
Siyu Liu
Jiamin Xu
Beilin Ye
Bo Hu
David J. Srolovitz
Tongqi Wen
17
0
0
16 May 2025
TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference
TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference
Raja Gond
Nipun Kwatra
Ramachandran Ramjee
12
0
0
16 May 2025
Improve Rule Retrieval and Reasoning with Self-Induction and Relevance ReEstimate
Improve Rule Retrieval and Reasoning with Self-Induction and Relevance ReEstimate
Ziyang Huang
Wangtao Sun
Jun Zhao
Kang Liu
LRM
17
0
0
16 May 2025
The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)
The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)
Kirill Vasilevski
Benjamin Rombaut
Gopi Krishnan Rajbahadur
G. Oliva
Keheliya Gallaba
...
Bouyan Chen
Kishanthan Thangarajah
Ahmed E. Hassan
Zhen Ming
Jiang
22
0
0
15 May 2025
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
24
1
0
15 May 2025
Analog Foundation Models
Analog Foundation Models
Julian Büchel
Iason Chalas
Giovanni Acampa
An Chen
Omobayode Fagbohungbe
Sidney Tsai
Kaoutar El Maghraoui
Manuel Le Gallo
Abbas Rahimi
Abu Sebastian
MQ
35
0
0
14 May 2025
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
Seungbeom Choi
Jeonghoe Goo
Eunjoo Jeon
Mingyu Yang
Minsung Jang
21
0
0
14 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
31
0
0
14 May 2025
Multimodal Survival Modeling in the Age of Foundation Models
Multimodal Survival Modeling in the Age of Foundation Models
Steven Song
Morgan Borjigin-Wang
Irene Madejski
Robert L. Grossman
28
0
0
12 May 2025
Learning from Peers in Reasoning Models
Learning from Peers in Reasoning Models
Tongxu Luo
Wenyu Du
Jiaxi Bi
Stephen Chung
Zhengyang Tang
Hao Yang
M. Zhang
Benyou Wang
LRM
41
0
0
12 May 2025
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
Arun S. Maiya
KELM
34
0
0
12 May 2025
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang
Weijie Xu
Fanyou Wu
Chandan K. Reddy
31
0
0
12 May 2025
On the Robustness of Reward Models for Language Model Alignment
On the Robustness of Reward Models for Language Model Alignment
Jiwoo Hong
Noah Lee
Eunki Kim
Guijin Son
Woojin Chung
Aman Gupta
Shao Tang
James Thorne
29
0
0
12 May 2025
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Yong Li
Haojie Wang
Biao Hou
Jidong Zhai
45
0
0
12 May 2025
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
Xiaolin Huang
Weiwen Liu
Xingshan Zeng
Yanhua Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Yishuo Wang
R. Tang
Defu Lian
KELM
36
0
0
12 May 2025
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
Shucheng Huang
Freda Shi
Chen Sun
Jiaming Zhong
Minghao Ning
Yufeng Yang
Yukun Lu
Hong Wang
A. Khajepour
33
0
0
11 May 2025
I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference
I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference
Zibo Gao
Junjie Hu
Feng Guo
Yixin Zhang
Yinglong Han
Siyuan Liu
Haiyang Li
Zhiqiang Lv
33
0
0
10 May 2025
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Joshua Harris
Fan Grayson
Felix Feldman
Timothy Laurence
Toby Nonnenmacher
...
Leo Loman
Selina Patel
Thomas Finnie
Samuel Collins
Michael Borowitz
AI4MH
LM&MA
ELM
54
0
0
09 May 2025
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Zehao Fan
Garrett Gagnon
Zhenyu Liu
Liu Liu
31
0
0
09 May 2025
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung
Jiachen Liu
Jeff J. Ma
Ruofan Wu
Oh Jun Kweon
Yuxuan Xia
Zhiyu Wu
Mosharaf Chowdhury
33
0
0
09 May 2025
CellVerse: Do Large Language Models Really Understand Cell Biology?
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang
Tianyu Liu
Zhihong Zhu
Yu Wang
Haoyu Wang
Donghao Zhou
Yefeng Zheng
Kun Wang
X. Wu
Pheng-Ann Heng
ELM
41
0
0
09 May 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
244
0
0
09 May 2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
Massimiliano Pronesti
Joao Bettencourt-Silva
Paul Flanagan
Alessandra Pascale
Oisin Redmond
Anya Belz
Yufang Hou
38
0
0
09 May 2025
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
68
0
0
06 May 2025
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
...
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
36
0
0
06 May 2025
Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Xueguang Ma
Luyu Gao
Shengyao Zhuang
Jiaqi Samantha Zhan
Jamie Callan
Jimmy Lin
205
0
0
05 May 2025
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Yushen Chen
Jiawei Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
38
0
0
05 May 2025
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
Arnab Sanyal
Prithwish Mukherjee
Gourav Datta
Sandeep P. Chinchali
MQ
207
0
0
05 May 2025
Adaptive Thinking via Mode Policy Optimization for Social Language Agents
Adaptive Thinking via Mode Policy Optimization for Social Language Agents
Minzheng Wang
You Li
Haozhao Wang
Xinghua Zhang
Nan Xu
Bingli Wu
Fei Huang
Haiyang Yu
Wenji Mao
LLMAG
LRM
43
1
0
04 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
60
2
0
04 May 2025
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Xingyu Zhou
Wei Long
Jingbo Lu
Shiyin Jiang
Weiyi You
Haifeng Wu
Shuhang Gu
48
0
0
04 May 2025
Accelerating Large Language Model Reasoning via Speculative Search
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
75
1
0
03 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
61
0
0
02 May 2025
123456789
Next