Accelerating Production LLMs with Combined Token/Embedding Speculators

29 April 2024

Papers citing "Accelerating Production LLMs with Combined Token/Embedding Speculators"

4 / 4 papers shown

Title
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment Gregor Bachmann Sotiris Anagnostidis Albert Pumarola Markos Georgopoulos A. Sanakoyeu Yuming Du Edgar Schönfeld Ali K. Thabet Jonas Kohler ALM BDL 95 6 0 31 Jan 2025
Mixture of Attentions For Speculative Decoding Matthieu Zimmer Milan Gritta Gerasimos Lampouras Haitham Bou Ammar Jun Wang 76 4 0 04 Oct 2024
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding Zack Ankner Rishab Parthasarathy Aniruddha Nrusimha Christopher Rinard Jonathan Ragan-Kelley William Brandon 29 25 0 07 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 130 141 0 03 Feb 2024