ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.05465
  4. Cited By
Vidur: A Large-Scale Simulation Framework For LLM Inference
v1v2 (latest)

Vidur: A Large-Scale Simulation Framework For LLM Inference

8 May 2024
Amey Agrawal
Nitin Kedia
Jayashree Mohan
Ashish Panwar
Nipun Kwatra
Bhargav S. Gulavani
Ramachandran Ramjee
Alexey Tumanov
    VLM
ArXiv (abs)PDFHTMLGithub (379★)

Papers citing "Vidur: A Large-Scale Simulation Framework For LLM Inference"

10 / 10 papers shown
Title
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents
Yueying Li
Jim Dai
Tianyi Peng
300
1
0
10 Apr 2025
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
513
0
0
08 Jan 2025
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
Kai Zhang
Xunliang Cai
MoE
125
3
0
16 Oct 2024
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
147
29
0
07 May 2024
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized
  Large Language Model Serving
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Yinmin Zhong
Shengyu Liu
Junda Chen
Jianbo Hu
Yibo Zhu
Xuanzhe Liu
Xin Jin
Hao Zhang
92
205
0
18 Jan 2024
Discourse Centric Evaluation of Machine Translation with a Densely
  Annotated Parallel Corpus
Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus
Yu Jiang
Tianyu Liu
Shuming Ma
Dongdong Zhang
Mrinmaya Sachan
Ryan Cotterell
73
7
0
18 May 2023
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale
  Language Models
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Zhuohan Li
Siyuan Zhuang
Shiyuan Guo
Danyang Zhuo
Hao Zhang
Basel Alomair
Ion Stoica
MoE
90
124
0
16 Feb 2021
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN
  Training
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training
Hongyu Zhu
Amar Phanishayee
Gennady Pekhimenko
137
50
0
05 Jun 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
345
1,920
0
17 Sep 2019
cuDNN: Efficient Primitives for Deep Learning
cuDNN: Efficient Primitives for Deep Learning
Sharan Chetlur
Cliff Woolley
Philippe Vandermersch
Jonathan M. Cohen
J. Tran
Bryan Catanzaro
Evan Shelhamer
144
1,849
0
03 Oct 2014
1