ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.06180
  4. Cited By
Efficient Memory Management for Large Language Model Serving with
  PagedAttention

Efficient Memory Management for Large Language Model Serving with PagedAttention

12 September 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
    VLM
ArXivPDFHTML

Papers citing "Efficient Memory Management for Large Language Model Serving with PagedAttention"

50 / 412 papers shown
Title
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li
Thuy-Trang Vu
Christian Herold
Amirhossein Tebbifakhr
Shahram Khadivi
Gholamreza Haffari
37
0
0
31 Mar 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
229
0
0
28 Mar 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
Joo-Young Kim
Jongse Park
59
0
0
24 Mar 2025
ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming
ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming
Dewei Wang
Wei Zhu
Liyang Ling
Ettore Tiotto
Quintin Wang
Whitney Tsang
Julian Opperman
Jacky Deng
46
0
0
19 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
91
4
0
18 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jingyang Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
67
5
0
18 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
58
3
0
17 Mar 2025
Mitigating KV Cache Competition to Enhance User Experience in LLM Inference
Mitigating KV Cache Competition to Enhance User Experience in LLM Inference
Haiying Shen
Tanmoy Sen
Masahiro Tanaka
234
0
0
17 Mar 2025
Pensez: Less Data, Better Reasoning -- Rethinking French LLM
Pensez: Less Data, Better Reasoning -- Rethinking French LLM
Huy Hoang Ha
ReLM
LRM
68
1
0
17 Mar 2025
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
Huan Yang
Renji Zhang
Mingzhe Huang
Weijun Wang
Yin Tang
Yuanchun Li
Yunxin Liu
Deyu Zhang
47
0
0
17 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Jun Wang
Jun Wang
216
0
0
15 Mar 2025
Take Off the Training Wheels Progressive In-Context Learning for Effective Alignment
Zhenyu Liu
Dongfang Li
Xinshuo Hu
X. Zhao
Yibin Chen
Baotian Hu
Min-Ling Zhang
49
1
0
13 Mar 2025
Can LLMs Understand Time Series Anomalies?
Can LLMs Understand Time Series Anomalies?
Zihao Zhou
Rose Yu
AI4TS
92
8
0
13 Mar 2025
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Arman Zharmagambetov
Chuan Guo
Ivan Evtimov
Maya Pavlova
Ruslan Salakhutdinov
Kamalika Chaudhuri
77
2
0
12 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Bo Liu
Yunxiang Li
Yangqiu Song
Hanjing Wang
Linyi Yang
Mark Schmidt
Jun Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAG
KELM
LRM
AI4CE
94
6
0
12 Mar 2025
Training Plug-n-Play Knowledge Modules with Deep Context Distillation
Training Plug-n-Play Knowledge Modules with Deep Context Distillation
Lucas Caccia
Alan Ansell
Edoardo Ponti
Ivan Vulić
Alessandro Sordoni
SyDa
265
0
0
11 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
233
1
0
10 Mar 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
Tong Zheng
Yongyu Mu
Yangqiu Song
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
245
0
0
09 Mar 2025
Reinforcement Learning with Verifiable Rewards: GRPO's Effective Loss, Dynamics, and Success Amplification
Youssef Mroueh
OffRL
43
5
0
09 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Yuhang Liu
Jin Jiang
Hao Fei
Jian Shao
Yueting Zhuang
LRM
ReLM
53
3
0
09 Mar 2025
Seesaw: High-throughput LLM Inference via Model Re-sharding
Qidong Su
Wei Zhao
Xuelong Li
Muralidhar Andoorveedu
Chenhao Jiang
Zhanda Zhu
Kevin Song
Christina Giannoula
Gennady Pekhimenko
LRM
77
0
0
09 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
111
2
0
07 Mar 2025
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
Benyamin Jamialahmadi
Parsa Kavehzadeh
Mehdi Rezagholizadeh
Parsa Farinneya
Hossein Rajabzadeh
A. Jafari
Boxing Chen
Marzieh S. Tahaei
52
0
0
06 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
66
0
0
06 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Zehao Wang
MoE
214
0
0
06 Mar 2025
Topology-Aware Conformal Prediction for Stream Networks
Jifan Zhang
Fangxin Wang
Philip S. Yu
Kaize Ding
Shixiang Zhu
AI4TS
41
0
0
06 Mar 2025
Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents
Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents
Jingying Zeng
Hui Liu
Zhenwei Dai
Xianfeng Tang
Chen Luo
Samarth Varshney
Zhen Li
Qi He
HILM
64
1
0
05 Mar 2025
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
Jiashun Suo
Xiaojian Liao
Limin Xiao
Li Ruan
Jinquan Wang
Xiao Su
Zhisheng Huo
72
0
0
04 Mar 2025
Alchemist: Towards the Design of Efficient Online Continual Learning System
Yuyang Huang
Yuhan Liu
Haryadi S. Gunawi
Beibin Li
Changho Hwang
CLL
OnRL
106
0
0
03 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLM
LRM
93
38
0
03 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu-Xi Cheng
69
0
0
03 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
123
6
0
03 Mar 2025
Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)
Jingtian Yan
Zhifei Li
William Kang
Yulun Zhang
Stephen Smith
Jiaoyang Li
48
0
0
03 Mar 2025
Knowledge Bridger: Towards Training-free Missing Multi-modality Completion
Knowledge Bridger: Towards Training-free Missing Multi-modality Completion
Guanzhou Ke
Shengfeng He
Xinyu Wang
Bo Wang
Guoqing Chao
Yuyao Zhang
Yi Xie
HeXing Su
73
0
0
27 Feb 2025
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
Xinsong Zhang
AAML
195
0
0
27 Feb 2025
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
Jake Poznanski
Jon Borchardt
Jason Dunkelberger
Regan Huff
Daniel Lin
Aman Rangapur
Christopher Wilhelm
Kyle Lo
Luca Soldaini
97
2
0
25 Feb 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
219
0
0
24 Feb 2025
Data-Constrained Synthesis of Training Data for De-Identification
Data-Constrained Synthesis of Training Data for De-Identification
Thomas Vakili
Aron Henriksson
Hercules Dalianis
SyDa
49
0
0
24 Feb 2025
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Weizhe Yuan
Jane Dwivedi-Yu
Song Jiang
Karthik Padthe
Yang Li
...
Ilia Kulikov
Kyunghyun Cho
Yuandong Tian
Jason Weston
Xian Li
ReLM
LRM
66
14
0
24 Feb 2025
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following
Jie Zeng
Qianyu He
Qingyu Ren
Jiaqing Liang
Yanghua Xiao
Weikang Zhou
Zeye Sun
Fei Yu
86
1
0
24 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
58
0
0
24 Feb 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
56
0
0
24 Feb 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
77
4
0
24 Feb 2025
SQLong: Enhanced NL2SQL for Longer Contexts with LLMs
SQLong: Enhanced NL2SQL for Longer Contexts with LLMs
Dai Quoc Nguyen
Cong Duy Vu Hoang
Duy Vu
Gioacchino Tangari
Thanh Tien Vu
Don Dharmasiri
Yuan-Fang Li
Long Duong
49
0
0
23 Feb 2025
CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
Vignesh Kothapalli
Hamed Firooz
Maziar Sanjabi
68
0
0
21 Feb 2025
CVE-LLM : Ontology-Assisted Automatic Vulnerability Evaluation Using Large Language Models
CVE-LLM : Ontology-Assisted Automatic Vulnerability Evaluation Using Large Language Models
Rikhiya Ghosh
H. V. Stockhausen
Martin Schmitt
George Marica Vasile
Sanjeev Kumar Karn
Oladimeji Farri
41
1
0
21 Feb 2025
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
Yuchen Yan
Jin Jiang
Yang Liu
Yixin Cao
Xin Xu
Hao Fei
Xunliang Cai
Jian Shao
ReLM
LRM
KELM
125
8
0
21 Feb 2025
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Joshua Harris
Timothy Laurence
Leo Loman
Fan Grayson
Toby Nonnenmacher
...
Hamish Mohammed
Thomas Finnie
Luke Hounsome
Michael Borowitz
Steven Riley
LM&MA
AI4MH
85
5
0
20 Feb 2025
Multilingual Language Model Pretraining using Machine-translated Data
Multilingual Language Model Pretraining using Machine-translated Data
Jiayi Wang
Yao Lu
Maurice Weber
Max Ryabinin
David Ifeoluwa Adelani
Yihong Chen
Raphael Tang
Pontus Stenetorp
LRM
83
3
0
20 Feb 2025
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Michael Luo
Xiaoxiang Shi
Colin Cai
Tianjun Zhang
Justin Wong
...
Chi Wang
Yanping Huang
Zhifeng Chen
Joseph E. Gonzalez
Ion Stoica
55
3
0
20 Feb 2025
Previous
123456789
Next