142
5
v1v2 (latest)

CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration

Main:6 Pages
4 Figures
Bibliography:2 Pages
Abstract

Large Language Models (LLMs) exhibit remarkable human-like predictive capabilities. However, it is challenging to deploy LLMs to provide efficient and adaptive inference services at the edge. This paper proposes a novel Cloud-Edge Collaboration framework for LLMs (CE-CoLLM) to tackle these challenges. First, we identify the transmission of LLM contextual data between the cloud and edge as a key performance bottleneck, which introduces substantial communication overhead that dominates overall inference latency and makes naïve cloud-edge collaboration for LLMs inefficient. Second, we introduce a suite of novel techniques, including a latency-aware early exit mechanism and efficient cloud context management, into CE-CoLLM, which collectively reduce communication overhead and preserve LLM inference accuracy. Third, we design two adaptive inference modes to accommodate diverse edge environments: (1) a low-latency standalone edge inference mode that enables reliable edge-side independent LLM inference even under unstable network conditions, and (2) a high-accuracy cloud-edge collaborative inference mode that adaptively leverages cloud resources to enhance prediction accuracy. Extensive experiments on multiple benchmark datasets demonstrate that CE-CoLLM reduces overall inference time by up to 13.81% and offloads over 84.53% of the computational workload from the cloud to the edge, compared to conventional cloud-based LLM deployment, without sacrificing prediction accuracy. The code is provided on GitHub atthis https URL.

View on arXiv
@article{jin2025_2411.02829,
  title={ CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration },
  author={ Hongpeng Jin and Yanzhao Wu },
  journal={arXiv preprint arXiv:2411.02829},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.