Recurrent Drafter for Fast Speculative Decoding in Large Language Models

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

14 March 2024

Yi Wang

Papers citing "Recurrent Drafter for Fast Speculative Decoding in Large Language Models"

5 / 5 papers shown

Title
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment Gregor Bachmann Sotiris Anagnostidis Albert Pumarola Markos Georgopoulos A. Sanakoyeu Yuming Du Edgar Schönfeld Ali K. Thabet Jonas Kohler ALM BDL 93 6 0 31 Jan 2025
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion Jacob K Christopher Brian Bartoldson Tal Ben-Nun Michael Cardei B. Kailkhura Ferdinando Fioretto DiffM 53 3 0 10 Aug 2024
Speculative Streaming: Fast LLM Inference without Auxiliary Models Nikhil Bhendawade Irina Belousova Qichen Fu Henry Mason Mohammad Rastegari Mahyar Najibi LRM 34 27 0 16 Feb 2024
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding Zack Ankner Rishab Parthasarathy Aniruddha Nrusimha Christopher Rinard Jonathan Ragan-Kelley William Brandon 26 25 0 07 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 127 139 0 03 Feb 2024