Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.00858
Cited By
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
29 February 2024
Raghavv Goel
Mukul Gagrani
Wonseok Jeon
Junyoung Park
Mingu Lee
Christopher Lott
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs"
4 / 4 papers shown
Title
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park
Dalton Jones
Matthew J Morse
Raghavv Goel
Mingu Lee
Chris Lott
27
0
0
21 Apr 2025
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
Hyun Ryu
Eric Kim
77
3
0
20 Nov 2024
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Sudhanshu Agrawal
Wonseok Jeon
Mingu Lee
25
2
0
24 Oct 2024
On Speculative Decoding for Multimodal Large Language Models
Mukul Gagrani
Raghavv Goel
Wonseok Jeon
Junyoung Park
Mingu Lee
Christopher Lott
LRM
40
8
0
13 Apr 2024
1