ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.09397
45
0
v1v2v3 (latest)

SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

11 June 2025
Xiangchen Li
Dimitrios Spatharakis
Saeid Ghafouri
Jiakun Fan
Dimitrios Nikolopoulos
Deepu John
Bo Ji
Dimitrios S. Nikolopoulos
ArXiv (abs)PDFHTML
Main:6 Pages
6 Figures
Bibliography:2 Pages
Abstract

Regardless of the advancements in device capabilities, efficient inferencing advanced large language models (LLMs) at the edge remains challenging due to limited device memory and power constraints. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new approach that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose \acronym, a method that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server efficiently batches and verifies the tokens utilizing a more precise target model. This approach supports device heterogeneity and reduces server-side memory footprint by avoiding the need to deploy multiple target models. Our initial experiments with Jetson Orin Nano, Raspberry Pi 4B/5, and an edge server equipped with 4 Nvidia A100 GPUs indicate substantial benefits: significantly increased system throughput, capacity, and better cost efficiency, all without sacrificing model accuracy.

View on arXiv
@article{li2025_2506.09397,
  title={ SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving },
  author={ Xiangchen Li and Dimitrios Spatharakis and Saeid Ghafouri and Jiakun Fan and Hans Vandierendonck and Deepu John and Bo Ji and Dimitrios Nikolopoulos },
  journal={arXiv preprint arXiv:2506.09397},
  year={ 2025 }
}
Comments on this paper