ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.15077
153
165
v1v2v3 (latest)

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

26 January 2024
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
ArXiv (abs)PDFHTML
Abstract

Auto-regressive decoding makes the inference of Large Language Models (LLMs) time-consuming. We propose a simple framework, EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), for lossless acceleration. Unlike traditional speculative sampling methods, EAGLE operates the drafting process auto-regressively at the more regular (second-top-layer) feature level and addresses the sampling uncertainty issues in the next-feature prediction problems by integrating tokens from one time step ahead. The acceleration provided by EAGLE is lossless: it involves no fine-tuning of the target LLM, and the generated text maintains the same distribution as that of vanilla auto-regressive decoding. As of the submission of this paper, EAGLE is the fastest known framework within the speculative sampling family. On MT-bench, EAGLE is 3x faster than vanilla decoding, 2x faster than Lookahead, and 1.6x faster than Medusa. Using gpt-fast, EAGLE attains on average 160 tokens/s with LLaMA2-Chat 13B on a single RTX 3090 GPU, compared to 24 tokens/s of Huggingface's implementations.

View on arXiv
@article{li2025_2401.15077,
  title={ EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty },
  author={ Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang },
  journal={arXiv preprint arXiv:2401.15077},
  year={ 2025 }
}
Comments on this paper