AlpaServe: Statistical Multiplexing with Model Parallelism for Deep
Learning Serving

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

22 February 2023

Zhuohan Li

Lianmin Zheng

Yinmin Zhong

Vincent Liu

Ying Sheng

Xin Jin

Yanping Huang

Zhifeng Chen

Hao Zhang

Joseph E. Gonzalez

Ion Stoica

Papers citing "AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving"

10 / 10 papers shown

Title
iServe: An Intent-based Serving System for LLMs Dimitrios Liakopoulos Tianrui Hu Prasoon Sinha N. Yadwadkar VLM 179 0 0 08 Jan 2025
Preble: Efficient Distributed Prompt Scheduling for LLM Serving Vikranth Srivatsa Zijian He Reyna Abhyankar Dongming Li Yiying Zhang 52 18 0 08 May 2024
Towards Pareto Optimal Throughput in Small Language Model Serving Pol G. Recasens Yue Zhu Chen Wang Eun Kyung Lee Olivier Tardieu Alaa Youssef Jordi Torres Josep Ll. Berral 40 4 0 04 Apr 2024
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms Jiaang Duan Shiyou Qian Dingyu Yang Hanwen Hu Jian Cao Guangtao Xue MoE 42 1 0 03 Apr 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines Jiaao He Jidong Zhai 39 27 0 18 Mar 2024
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows Yuting Yang Andrea Merlina Weijia Song Tiancheng Yuan Ken Birman Roman Vitenberg 49 0 0 27 Feb 2024
DEAP: Design Space Exploration for DNN Accelerator Parallelism Ekansh Agrawal Xiangyu Sam Xu 26 1 0 24 Dec 2023
Splitwise: Efficient generative LLM inference using phase splitting Pratyush Patel Esha Choukse Chaojie Zhang Aashaka Shah Íñigo Goiri Saeed Maleki Ricardo Bianchini 49 197 0 30 Nov 2023
Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider Mohammad Shahrad Rodrigo Fonseca Íñigo Goiri G. Chaudhry Paul Batum Jason Cooke Eduardo Laureano Colby Tresness M. Russinovich Ricardo Bianchini 89 601 0 06 Mar 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019