Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.06180
Cited By
Efficient Memory Management for Large Language Model Serving with PagedAttention
12 September 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Memory Management for Large Language Model Serving with PagedAttention"
50 / 412 papers shown
Title
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Joshua Harris
Timothy Laurence
Leo Loman
Fan Grayson
Toby Nonnenmacher
...
Hamish Mohammed
Thomas Finnie
Luke Hounsome
Michael Borowitz
Steven Riley
LM&MA
AI4MH
85
5
0
20 Feb 2025
Grounding LLM Reasoning with Knowledge Graphs
Alfonso Amayuelas
Joy Prakash Sain
Simerjot Kaur
Charese Smiley
93
0
0
18 Feb 2025
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu
Shaoning Sun
Xiaohui Hu
Jiaxu Yan
Kaidong Yu
Xuelong Li
ELM
95
5
0
17 Feb 2025
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Weizhe Chen
Zhicheng Zhang
Guanlin Liu
Renjie Zheng
Wenlei Shi
Chen Dun
Zheng Wu
Xing Jin
Lin Yan
ALM
LRM
56
1
0
17 Feb 2025
Towards Reasoning Ability of Small Language Models
Gaurav Srivastava
Shuxiang Cao
Xuan Wang
ReLM
LRM
62
7
0
17 Feb 2025
Presumed Cultural Identity: How Names Shape LLM Responses
Siddhesh Pawar
Arnav Arora
Lucie-Aimée Kaffee
Isabelle Augenstein
58
0
0
17 Feb 2025
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
108
15
0
17 Feb 2025
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall
Cheng Perng Phoo
Mia Chiquier
Bharath Hariharan
Kavita Bala
Carl Vondrick
79
1
0
17 Feb 2025
Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language Models
Jongho Kim
Seung-won Hwang
LRM
AI4CE
60
1
0
17 Feb 2025
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
Pengcheng Jiang
Lang Cao
Ruike Zhu
Minhao Jiang
Yunyi Zhang
Jiashuo Sun
Jiawei Han
RALM
88
0
0
16 Feb 2025
Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
53
0
0
16 Feb 2025
Probing Semantic Routing in Large Mixture-of-Expert Models
M. L. Olson
Neale Ratzlaff
Musashi Hinck
Man Luo
Sungduk Yu
Chendi Xue
Vasudev Lal
MoE
LRM
57
2
0
15 Feb 2025
Ten Challenging Problems in Federated Foundation Models
Tao Fan
Hanlin Gu
Xuemei Cao
Chee Seng Chan
Qian Chen
...
Yuanyuan Zhang
Xiaojin Zhang
Zhenzhe Zheng
Lixin Fan
Qiang Yang
FedML
89
4
0
14 Feb 2025
Typhoon T1: An Open Thai Reasoning Model
Pittawat Taveekitworachai
Potsawee Manakul
Kasima Tharnpipitchai
Kunat Pipatanakul
OffRL
LRM
102
0
0
13 Feb 2025
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Heejun Lee
G. Park
Jaduk Suh
Sung Ju Hwang
89
3
0
13 Feb 2025
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Zhenxing Mi
Kuan-Chieh Jackson Wang
Guocheng Qian
Hanrong Ye
Runtao Liu
Sergey Tulyakov
Kfir Aberman
Dan Xu
LRM
47
0
0
12 Feb 2025
Auditing Prompt Caching in Language Model APIs
Chenchen Gu
Xiang Lisa Li
Rohith Kuditipudi
Percy Liang
Tatsunori Hashimoto
78
0
0
11 Feb 2025
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Adithya Pratapa
Teruko Mitamura
94
1
0
10 Feb 2025
InSTA: Towards Internet-Scale Training For Agents
Brandon Trabucco
Gunnar A. Sigurdsson
Robinson Piramuthu
Ruslan Salakhutdinov
ALM
106
2
0
10 Feb 2025
Preventing Rogue Agents Improves Multi-Agent Collaboration
Ohav Barbi
Ori Yoran
Mor Geva
55
1
0
09 Feb 2025
Iterative Deepening Sampling for Large Language Models
Weizhe Chen
Sven Koenig
B. Dilkina
LRM
ReLM
88
1
0
08 Feb 2025
DeepThink: Aligning Language Models with Domain-Specific User Intents
Yang Li
Mingxuan Luo
Yeyun Gong
Chen Lin
Jian Jiao
Yi Liu
Kaili Huang
LRM
ALM
ELM
59
0
0
08 Feb 2025
Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction
Shengbin Yue
Ting Huang
Zheng Jia
Siyuan Wang
Shujun Liu
Yun Song
Xuanjing Huang
Zhongyu Wei
AILaw
ELM
69
0
0
08 Feb 2025
Optimizing Temperature for Language Models with Multi-Sample Inference
Weihua Du
Yiming Yang
Sean Welleck
67
2
0
07 Feb 2025
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Yuanye Liu
Jiahang Xu
Li Zhang
Qi Chen
Xuan Feng
Yang Chen
Zhongxin Guo
Yuqing Yang
Cheng Peng
84
2
0
06 Feb 2025
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment
Haoyu Wang
Zeyu Qin
Li Shen
Xueqian Wang
Minhao Cheng
Dacheng Tao
99
2
0
06 Feb 2025
Twilight: Adaptive Attention Sparsity with Hierarchical Top-
p
p
p
Pruning
C. Lin
Jiaming Tang
Shuo Yang
Hanshuo Wang
Tian Tang
Boyu Tian
Ion Stoica
Enze Xie
Mingyu Gao
97
2
0
04 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
81
5
0
04 Feb 2025
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
87
0
0
03 Feb 2025
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu
Yao Chen
Zeyi Wen
Weiguo Liu
Bingsheng He
84
1
0
02 Feb 2025
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
96
2
0
02 Feb 2025
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
Nadav Timor
Jonathan Mamou
Daniel Korat
Moshe Berchansky
Oren Pereg
Gaurav Jain
Roy Schwartz
Moshe Wasserblat
David Harel
98
2
0
31 Jan 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
133
16
0
30 Jan 2025
Diverse Preference Optimization
Jack Lanchantin
Angelica Chen
S. Dhuliawala
Ping Yu
Jason Weston
Sainbayar Sukhbaatar
Ilia Kulikov
107
4
0
30 Jan 2025
GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments
Yanyu Chen
Ganhong Huang
113
0
0
28 Jan 2025
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Yuan Feng
Junlin Lv
Yukun Cao
Xike Xie
S. K. Zhou
VLM
66
29
0
28 Jan 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
49
1
0
28 Jan 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
94
14
0
28 Jan 2025
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
Hamed Firooz
Maziar Sanjabi
Adrian Englhardt
Aman Gupta
Ben Levine
...
Xiaoling Zhai
Ya Xu
Yu Wang
Yun Dai
Yun Dai
ALM
53
3
0
27 Jan 2025
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li
Xuyang Hu
Xiaoye Qu
Linjie Li
Yu-Xi Cheng
53
3
0
22 Jan 2025
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Haotian Luo
Li Shen
Haiying He
Yishuo Wang
Shiwei Liu
Wei Li
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
VLM
LRM
92
50
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Zheng Yang
VLM
ALM
OffRL
AI4TS
LRM
120
163
0
22 Jan 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Yongxu Liu
Chen Zhao
Arman Cohan
61
5
0
21 Jan 2025
Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration
Thomas Walshe
S. Moon
Chunyang Xiao
Yawwani Gunawardana
Fran Silavong
50
2
0
21 Jan 2025
ALoFTRAG: Automatic Local Fine Tuning for Retrieval Augmented Generation
Peter Devine
47
0
0
21 Jan 2025
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Han Guo
William Brandon
Radostin Cholakov
Jonathan Ragan-Kelley
Eric P. Xing
Yoon Kim
MQ
94
12
0
20 Jan 2025
Aligning Instruction Tuning with Pre-training
Yiming Liang
Tianyu Zheng
Xinrun Du
Ge Zhang
Qingbin Liu
...
Zhaoxiang Zhang
Wenhao Huang
Jiajun Zhang
Xiang Yue
Jiajun Zhang
96
1
0
16 Jan 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
223
1
0
15 Jan 2025
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory
Jerry Chee
A. Backurs
Rainie Heck
Li Zhang
Janardhan Kulkarni
Thomas Rothvoss
Sivakanth Gopi
MQ
54
0
0
11 Jan 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
253
0
0
08 Jan 2025
Previous
1
2
3
4
5
6
7
8
9
Next