ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.06180
  4. Cited By
Efficient Memory Management for Large Language Model Serving with
  PagedAttention

Efficient Memory Management for Large Language Model Serving with PagedAttention

12 September 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
    VLM
ArXivPDFHTML

Papers citing "Efficient Memory Management for Large Language Model Serving with PagedAttention"

50 / 412 papers shown
Title
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
Xiaonan Jing
Srinivas Billa
Danny Godbout
HILM
45
0
0
16 Oct 2024
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks
Rudra Murthy
Prince Kumar
Praveen Venkateswaran
Danish Contractor
KELM
ALM
ELM
47
1
0
16 Oct 2024
Sequential LLM Framework for Fashion Recommendation
Sequential LLM Framework for Fashion Recommendation
Han Liu
Xianfeng Tang
Tianlang Chen
Jiapeng Liu
Indu Indu
...
Roberto Fernandez Galan
Michael D Porter
Dongmei Jia
Ke Yang
Lian Xiong
AI4TS
37
2
0
15 Oct 2024
Effective Self-Mining of In-Context Examples for Unsupervised Machine
  Translation with LLMs
Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs
Abdellah El Mekki
Muhammad Abdul-Mageed
LRM
38
0
0
14 Oct 2024
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Yein Park
Chanwoong Yoon
Jungwoo Park
Donghyeon Lee
Minbyul Jeong
Jaewoo Kang
KELM
64
1
0
13 Oct 2024
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
Heng Chang
Miao Zheng
Fan Yang
Guosheng Dong
Bin Cui
Xin Wu
Zenan Zhou
Wentao Zhang
ALM
51
6
0
12 Oct 2024
Language Imbalance Driven Rewarding for Multilingual Self-improving
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
Jiaming Zhang
ALM
LRM
74
4
0
11 Oct 2024
CursorCore: Assist Programming through Aligning Anything
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
55
1
0
09 Oct 2024
Learning Evolving Tools for Large Language Models
Learning Evolving Tools for Large Language Models
Guoxin Chen
Zhong Zhang
Xin Cong
Fangda Guo
Yesai Wu
Yankai Lin
Wenzheng Feng
Yasheng Wang
KELM
54
1
0
09 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
17
0
06 Oct 2024
LongGenBench: Long-context Generation Benchmark
LongGenBench: Long-context Generation Benchmark
Xiang Liu
Peijie Dong
Xuming Hu
Xiaowen Chu
RALM
55
8
0
05 Oct 2024
What do Large Language Models Need for Machine Translation Evaluation?
What do Large Language Models Need for Machine Translation Evaluation?
Shenbin Qian
Archchana Sindhujan
Minnie Kabra
Diptesh Kanojia
Constantin Orasan
Tharindu Ranasinghe
Frédéric Blain
ELM
LRM
ALM
LM&MA
40
0
0
04 Oct 2024
Geometric Collaborative Filtering with Convergence
Geometric Collaborative Filtering with Convergence
Hisham Husain
Julien Monteil
FedML
30
0
0
04 Oct 2024
Mixture of Attentions For Speculative Decoding
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
76
4
0
04 Oct 2024
Permissive Information-Flow Analysis for Large Language Models
Permissive Information-Flow Analysis for Large Language Models
Shoaib Ahmed Siddiqui
Radhika Gaonkar
Boris Köpf
David M. Krueger
Andrew Paverd
Ahmed Salem
Shruti Tople
Lukas Wutschitz
Menglin Xia
Santiago Zanella Béguelin
38
1
0
04 Oct 2024
Optimizing Adaptive Attacks against Watermarks for Language Models
Optimizing Adaptive Attacks against Watermarks for Language Models
Abdulrahman Diaa
Toluwani Aremu
Nils Lukas
AAML
WaLM
41
2
0
03 Oct 2024
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
Shijie Chen
Bernal Jiménez Gutiérrez
Yu Su
33
4
0
03 Oct 2024
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
Xiangyu Peng
Congying Xia
Xinyi Yang
Caiming Xiong
Chien-Sheng Wu
Chen Xing
LRM
48
2
0
03 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
48
9
0
03 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive Models
ControlAR: Controllable Image Generation with Autoregressive Models
Zongming Li
Tianheng Cheng
Shoufa Chen
Peize Sun
Haocheng Shen
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
DiffM
136
15
0
03 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
87
1
0
02 Oct 2024
ENTP: Encoder-only Next Token Prediction
ENTP: Encoder-only Next Token Prediction
Ethan Ewer
Daewon Chae
Thomas Zeng
Jinkyu Kim
Kangwook Lee
38
3
0
02 Oct 2024
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
Linke Song
Zixuan Pang
Wenhao Wang
Zihao Wang
XiaoFeng Wang
Hongbo Chen
Wei Song
Yier Jin
Dan Meng
Rui Hou
56
7
0
30 Sep 2024
Confidential Prompting: Protecting User Prompts from Cloud LLM Providers
Confidential Prompting: Protecting User Prompts from Cloud LLM Providers
In Gim
Caihua Li
Lin Zhong
55
2
0
27 Sep 2024
Open-World Evaluation for Retrieving Diverse Perspectives
Open-World Evaluation for Retrieving Diverse Perspectives
Hung-Ting Chen
Eunsol Choi
35
0
0
26 Sep 2024
A-VL: Adaptive Attention for Large Vision-Language Models
A-VL: Adaptive Attention for Large Vision-Language Models
Junyang Zhang
Mu Yuan
Ruiguang Zhong
Puhan Luo
Huiyou Zhan
Ningkang Zhang
Chengchen Hu
Xiangyang Li
VLM
43
1
0
23 Sep 2024
Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
Huanxuan Liao
Shizhu He
Yao Xu
Yuanzhe Zhang
Kang Liu
Jun Zhao
LRM
56
3
0
20 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
Hao Fei
Xunliang Cai
Yixin Cao
Liangcai Gao
Zhi Tang
LRM
55
5
0
19 Sep 2024
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
Mingjie Liu
Yun-Da Tsai
Wenfei Zhou
Haoxing Ren
SyDa
3DV
47
7
0
19 Sep 2024
RUIE: Retrieval-based Unified Information Extraction using Large Language Model
RUIE: Retrieval-based Unified Information Extraction using Large Language Model
Xincheng Liao
Junwen Duan
Yixi Huang
Jianxin Wang
48
1
0
18 Sep 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLM
LRM
125
89
0
18 Sep 2024
Seek and Solve Reasoning for Table Question Answering
Seek and Solve Reasoning for Table Question Answering
Ruya Jiang
Chun Wang
Weihong Deng
LMTD
ReLM
LRM
48
2
0
09 Sep 2024
InstInfer: In-Storage Attention Offloading for Cost-Effective
  Long-Context LLM Inference
InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference
Xiurui Pan
Endian Li
Qiao Li
Shengwen Liang
Yizhou Shan
Ke Zhou
Yingwei Luo
Xiaolin Wang
Jie Zhang
47
10
0
08 Sep 2024
You Only Use Reactive Attention Slice For Long Context Retrieval
You Only Use Reactive Attention Slice For Long Context Retrieval
Yun Joon Soh
Hanxian Huang
Yuandong Tian
Jishen Zhao
RALM
46
0
0
03 Sep 2024
Efficient LLM Scheduling by Learning to Rank
Efficient LLM Scheduling by Learning to Rank
Yichao Fu
Siqi Zhu
Runlong Su
Aurick Qiao
Ion Stoica
Hao Zhang
58
19
0
28 Aug 2024
An Investigation of Warning Erroneous Chat Translations in Cross-lingual
  Communication
An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication
Yunmeng Li
Jun Suzuki
Makoto Morishita
Kaori Abe
Kentaro Inui
65
1
0
28 Aug 2024
FASST: Fast LLM-based Simultaneous Speech Translation
FASST: Fast LLM-based Simultaneous Speech Translation
Siqi Ouyang
Xi Xu
Chinmay Dandekar
Lei Li
30
3
0
18 Aug 2024
Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Yang Nan
Huichi Zhou
Xiaodan Xing
Guang Yang
54
3
0
16 Aug 2024
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference
  Serving at Scale
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
49
9
0
10 Aug 2024
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning
Yuze Zhao
Jintao Huang
Jinghan Hu
Xingjun Wang
Yunlin Mao
...
Zhikai Wu
Baole Ai
Ang Wang
Wenmeng Zhou
Yingda Chen
50
31
0
10 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
79
48
0
05 Aug 2024
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
Kunlun Zhu
Yifan Luo
Dingling Xu
Ruobing Wang
Shi Yu
...
Yishan Li
Zhiyuan Liu
Xu Han
Zhiyuan Liu
Maosong Sun
36
17
0
02 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
42
1
0
01 Aug 2024
AI-Assisted Generation of Difficult Math Questions
AI-Assisted Generation of Difficult Math Questions
Vedant Shah
Dingli Yu
Kaifeng Lyu
Simon Park
Nan Rosemary Ke
...
Yoshua Bengio
Sanjeev Arora
Anirudh Goyal
Sanjeev Arora
Anirudh Goyal
53
16
0
30 Jul 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
ThinK: Thinner Key Cache by Query-Driven Pruning
Yuhui Xu
Zhanming Jie
Hanze Dong
Lei Wang
Xudong Lu
Aojun Zhou
Amrita Saha
Caiming Xiong
Doyen Sahoo
75
15
0
30 Jul 2024
Improving Retrieval Augmented Language Model with Self-Reasoning
Improving Retrieval Augmented Language Model with Self-Reasoning
Yuan Xia
Jingbo Zhou
Zhenhui Shi
Jun Chen
Hai-ting Huang
AIFin
LRM
ReLM
KELM
55
8
0
29 Jul 2024
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
Somshubra Majumdar
Vahid Noroozi
Sean Narenthiran
Aleksander Ficek
Aleksander Ficek
Wasi Uddin Ahmad
Jocelyn Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
58
2
0
29 Jul 2024
Towards Aligning Language Models with Textual Feedback
Towards Aligning Language Models with Textual Feedback
Sauc Abadal Lloret
S. Dhuliawala
K. Murugesan
Mrinmaya Sachan
VLM
50
1
0
24 Jul 2024
Token-Picker: Accelerating Attention in Text Generation with Minimized
  Memory Transfer via Probability Estimation
Token-Picker: Accelerating Attention in Text Generation with Minimized Memory Transfer via Probability Estimation
Junyoung Park
Myeonggu Kang
Yunki Han
Yang-Gon Kim
Jaekang Shin
Lee-Sup Kim
27
0
0
21 Jul 2024
On the Design and Analysis of LLM-Based Algorithms
On the Design and Analysis of LLM-Based Algorithms
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
51
5
0
20 Jul 2024
Previous
123456789
Next