Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.11798
Cited By
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
16 July 2024
Branden Butler
Sixing Yu
Arya Mazaheri
Ali Jannesari
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation"
7 / 7 papers shown
Title
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Xichong Zhang
Yunming Liao
Liusheng Huang
67
1
0
13 Mar 2025
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
249
554
0
07 Mar 2024
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
148
1,083
0
08 Jan 2024
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
63
107
0
08 Aug 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
71
36
0
12 Jul 2023
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
151
460
0
06 Nov 2019
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
308
2,854
0
26 Sep 2016
1