ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.00023
  4. Cited By
Preble: Efficient Distributed Prompt Scheduling for LLM Serving

Preble: Efficient Distributed Prompt Scheduling for LLM Serving

8 May 2024
Vikranth Srivatsa
Zijian He
Reyna Abhyankar
Dongming Li
Yiying Zhang
ArXiv (abs)PDFHTML

Papers citing "Preble: Efficient Distributed Prompt Scheduling for LLM Serving"

41 / 41 papers shown
Title
GenTorrent: Scaling Large Language Model Serving with An Overley Network
GenTorrent: Scaling Large Language Model Serving with An Overley Network
Fei Fang
Yifan Hua
Shengze Wang
Ruilin Zhou
Y. Liu
Chen Qian
Wei Wei
108
0
0
27 Apr 2025
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents
Junkai Li
Yunghwei Lai
Weitao Li
Jingyi Ren
Meng Zhang
...
Siyu Wang
Ziwei Sun
Yanzhe Zhang
Weizhi Ma
Yang Liu
LLMAGLM&MALM&RoMedIm
152
122
0
20 Jan 2025
WorldSimBench: Towards Video Generation Models as World Simulators
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin
Zhelun Shi
Jiwen Yu
Xijun Wang
Enshen Zhou
...
Lu Sheng
Jing Shao
Junlin Wu
Wanli Ouyang
Ruimao Zhang
EGVMVGen
192
471
0
23 Oct 2024
MemServe: Context Caching for Disaggregated LLM Serving with Elastic
  Memory Pool
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
Cunchen Hu
Heyang Huang
Junhao Hu
Jiang Xu
Xusheng Chen
...
Chenxi Wang
Sa Wang
Yungang Bao
Ninghui Sun
Yizhou Shan
LLMAG
84
30
0
25 Jun 2024
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Ruoyu Qin
Zheming Li
Weiran He
Mingxing Zhang
Yongwei Wu
Weimin Zheng
Xinran Xu
104
66
0
24 Jun 2024
Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Tsendsuren Munkhdalai
Manaal Faruqui
Siddharth Gopal
LRMLLMAGCLL
128
119
0
10 Apr 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Zhicheng Guo
Sijie Cheng
Hao Wang
Shihao Liang
Yujia Qin
Peng Li
Zhiyuan Liu
Maosong Sun
Yang Liu
ELM
127
30
0
12 Mar 2024
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal
Nitin Kedia
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
Alexey Tumanov
Ramachandran Ramjee
90
183
0
04 Mar 2024
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Jordan Juravsky
Bradley Brown
Ryan Ehrlich
Daniel Y. Fu
Christopher Ré
Azalia Mirhoseini
104
40
0
07 Feb 2024
More Agents Is All You Need
More Agents Is All You Need
Junyou Li
Qin Zhang
Yangbin Yu
Qiang Fu
Deheng Ye
LLMAG
174
72
0
03 Feb 2024
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized
  Large Language Model Serving
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Yinmin Zhong
Shengyu Liu
Junda Chen
Jianbo Hu
Yibo Zhu
Xuanzhe Liu
Xin Jin
Hao Zhang
81
203
0
18 Jan 2024
Splitwise: Efficient generative LLM inference using phase splitting
Splitwise: Efficient generative LLM inference using phase splitting
Pratyush Patel
Esha Choukse
Chaojie Zhang
Aashaka Shah
Íñigo Goiri
Saeed Maleki
Ricardo Bianchini
81
243
0
30 Nov 2023
LooGLE: Can Long-Context Language Models Understand Long Contexts?
LooGLE: Can Long-Context Language Models Understand Long Contexts?
Jiaqi Li
Mengmeng Wang
Zilong Zheng
Muhan Zhang
ELMRALM
77
133
0
08 Nov 2023
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
In Gim
Guojun Chen
Seung-seob Lee
Nikhil Sarda
Anurag Khandelwal
Lin Zhong
82
87
0
07 Nov 2023
Ring Attention with Blockwise Transformers for Near-Infinite Context
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu
Matei A. Zaharia
Pieter Abbeel
95
255
0
03 Oct 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
  Long Sequence Transformer Models
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
56
120
0
25 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
170
161
0
19 Sep 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
192
2,311
0
12 Sep 2023
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked
  Prefills
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Amey Agrawal
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
Ramachandran Ramjee
AI4TSLRM
77
107
0
31 Aug 2023
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu
Gagan Bansal
Jieyu Zhang
Yiran Wu
Beibin Li
...
Jiale Liu
Ahmed Hassan Awadallah
Ryen W. White
Doug Burger
Chi Wang
LLMAGAI4CE
96
383
0
16 Aug 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
  APIs
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin
Shi Liang
Yining Ye
Kunlun Zhu
Lan Yan
...
Jie Zhou
Mark B. Gerstein
Dahai Li
Zhiyuan Liu
Maosong Sun
CLLALMLLMAGELMLM&MA
180
707
0
31 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
410
4,422
0
09 Jun 2023
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via
  Tool Embeddings
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Shibo Hao
Tianyang Liu
Zhen Wang
Zhiting Hu
RALMLLMAG
126
182
0
19 May 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao
Dian Yu
Jeffrey Zhao
Izhak Shafran
Thomas Griffiths
Yuan Cao
Karthik Narasimhan
LM&RoLRMAI4CE
161
2,025
0
17 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with
  Tree-based Speculative Inference and Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
105
155
0
16 May 2023
Self-Chained Image-Language Model for Video Localization and Question
  Answering
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Joey Tianyi Zhou
130
140
0
11 May 2023
Fast Distributed Inference Serving for Large Language Models
Fast Distributed Inference Serving for Large Language Models
Bingyang Wu
Yinmin Zhong
Zili Zhang
Gang Huang
Xuanzhe Liu
Xin Jin
68
102
0
10 May 2023
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Erik Nijkamp
A. Ghobadzadeh
Caiming Xiong
Silvio Savarese
Yingbo Zhou
208
169
0
03 May 2023
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep
  Learning Serving
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Zhuohan Li
Lianmin Zheng
Yinmin Zhong
Vincent Liu
Ying Sheng
...
Yanping Huang
Zhifeng Chen
Hao Zhang
Joseph E. Gonzalez
Ion Stoica
MoE
28
68
0
22 Feb 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDaRALM
162
1,766
0
09 Feb 2023
Automatic Chain of Thought Prompting in Large Language Models
Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLMLRM
150
632
0
07 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAGReLMLRM
436
2,955
0
06 Oct 2022
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models
  at Unprecedented Scale
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Reza Yazdani Aminabadi
Samyam Rajbhandari
Minjia Zhang
A. A. Awan
Cheng-rong Li
...
Elton Zheng
Jeff Rasley
Shaden Smith
Olatunji Ruwase
Yuxiong He
76
369
0
30 Jun 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLMBDLLRMAI4CE
524
3,721
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
833
9,644
0
28 Jan 2022
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge
  for Embodied Agents
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
Wenlong Huang
Pieter Abbeel
Deepak Pathak
Igor Mordatch
LM&Ro
99
1,122
0
18 Jan 2022
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELMAIMatALM
259
703
0
20 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
97
506
0
18 May 2021
Taming Transformers for High-Resolution Image Synthesis
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
131
2,999
0
17 Dec 2020
ALFWorld: Aligning Text and Embodied Environments for Interactive
  Learning
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar
Xingdi Yuan
Marc-Alexandre Côté
Yonatan Bisk
Adam Trischler
Matthew J. Hausknecht
LM&RoLLMAG
92
443
0
08 Oct 2020
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,363
0
12 Jun 2017
1