PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

16 July 2024

Papers citing "PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation"

7 / 7 papers shown

Title
Collaborative Speculative Inference for Efficient LLM Inference Serving Luyao Gao Jianchun Liu Hongli Xu Xichong Zhang Yunming Liao Liusheng Huang 67 1 0 13 Mar 2025
Yi: Open Foundation Models by 01.AI 01. AI Alex Young 01.AI Alex Young Bei Chen Chao Li ... Yue Wang Yuxuan Cai Zhenyu Gu Zhiyuan Liu Zonghong Dai OSLM LRM 249 554 0 07 Mar 2024
Mixtral of Experts Albert Q. Jiang Alexandre Sablayrolles Antoine Roux A. Mensch Blanche Savary ... Théophile Gervet Thibaut Lavril Thomas Wang Timothée Lacroix William El Sayed MoE LLMAG 148 1,083 0 08 Jan 2024
Accelerating LLM Inference with Staged Speculative Decoding Benjamin Spector Christal Re 63 107 0 08 Aug 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding Seongjun Yang Gibbeum Lee Jaewoong Cho Dimitris Papailiopoulos Kangwook Lee 71 36 0 12 Jul 2023
Fast Transformer Decoding: One Write-Head is All You Need Noam M. Shazeer 151 460 0 06 Nov 2019
Pointer Sentinel Mixture Models Stephen Merity Caiming Xiong James Bradbury R. Socher RALM 308 2,854 0 26 Sep 2016