The Synergy of Speculative Decoding and Batching in Serving Large Language Models

28 October 2023

Papers citing "The Synergy of Speculative Decoding and Batching in Serving Large Language Models"

3 / 3 papers shown

Title
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System Yintao He Haiyu Mao Christina Giannoula Mohammad Sadrosadati Juan Gómez Luna Huawei Li Xiaowei Li Ying Wang O. Mutlu 48 6 0 21 Feb 2025
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Jian Chen Vashisth Tiwari Ranajoy Sadhukhan Zhuoming Chen Jinyuan Shi Ian En-Hsu Yen Ian En-Hsu Yen Avner May Tianqi Chen Beidi Chen LRM 44 22 0 20 Aug 2024
Decoding Speculative Decoding Minghao Yan Saurabh Agarwal Shivaram Venkataraman LRM 45 6 0 02 Feb 2024