v1v2 (latest)

Enhancing Energy-efficiency by Solving the Throughput Bottleneck of LSTM Cells for Embedded FPGAs

4 October 2023

Abstract

To process sensor data in the Internet of Things(IoTs), embedded deep learning for 1-dimensional data is an important technique. In the past, CNNs were frequently used because they are simple to optimise for special embedded hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed at energy-efficient inference on end devices. Using the traffic speed prediction as a case study, a vanilla LSTM model with the optimised LSTM cell achieves 17534 inferences per second while consuming only 3.8 $\mu$ J per inference on the FPGA XC7S15 from Spartan-7 family. It achieves at least 5.4 $\times$ faster throughput and 1.37 $\times$ more energy efficient than existing approaches.

View on arXiv

Comments on this paper