'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators
Ruichi Han
Yizhi Chen
Tong Lei
Jordi Altayo Gonzalez
Ahmed Hemani
- MQ
Main:3 Pages
7 Figures
Bibliography:2 Pages
1 Tables
Abstract
Interconnect power consumption remains a bottleneck in Deep Neural Network (DNN) accelerators. While ordering data based on '1'-bit counts can mitigate this via reduced switching activity, practical hardware sorting implementations remain underexplored. This work proposes the hardware implementation of a comparison-free sorting unit optimized for Convolutional Neural Networks (CNN). By leveraging approximate computing to group population counts into coarse-grained buckets, our design achieves hardware area reductions while preserving the link power benefits of data reordering. Our approximate sorting unit achieves up to 35.4% area reduction while maintaining 19.50\% BT reduction compared to 20.42% of precise implementation.
View on arXivComments on this paper
