Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs

29 February 2024

Papers citing "Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs"

4 / 4 papers shown

Title
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments Junyoung Park Dalton Jones Matthew J Morse Raghavv Goel Mingu Lee Chris Lott 27 0 0 21 Apr 2025
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding Hyun Ryu Eric Kim 77 3 0 20 Nov 2024
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability Sudhanshu Agrawal Wonseok Jeon Mingu Lee 25 2 0 24 Oct 2024
On Speculative Decoding for Multimodal Large Language Models Mukul Gagrani Raghavv Goel Wonseok Jeon Junyoung Park Mingu Lee Christopher Lott LRM 40 8 0 13 Apr 2024