Large Language Models (LLMs) enable real-time function calling in edge AI systems but introduce significant computational overhead, leading to high power consumption and carbon emissions. Existing methods optimize for performance while neglecting sustainability, making them inefficient for energy-constrained environments. We introduce CarbonCall, a sustainability-aware function-calling framework that integrates dynamic tool selection, carbon-aware execution, and quantized LLM adaptation. CarbonCall adjusts power thresholds based on real-time carbon intensity forecasts and switches between model variants to sustain high tokens-per-second throughput under power constraints. Experiments on an NVIDIA Jetson AGX Orin show that CarbonCall reduces carbon emissions by up to 52%, power consumption by 30%, and execution time by 30%, while maintaining high efficiency.
View on arXiv@article{paramanayakam2025_2504.20348, title={ CarbonCall: Sustainability-Aware Function Calling for Large Language Models on Edge Devices }, author={ Varatheepan Paramanayakam and Andreas Karatzas and Iraklis Anagnostopoulos and Dimitrios Stamoulis }, journal={arXiv preprint arXiv:2504.20348}, year={ 2025 } }