Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

19 January 2024

Tianle Cai

Papers citing "Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads"

3 / 53 papers shown

Title
Decoding Speculative Decoding Minghao Yan Saurabh Agarwal Shivaram Venkataraman LRM 27 5 0 02 Feb 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 311 11,915 0 04 Mar 2022
Locally Typical Sampling Clara Meister Tiago Pimentel Gian Wiher Ryan Cotterell 140 86 0 01 Feb 2022