EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary
Neuromorphic Vision Sensors
Neuromorphic vision sensors (NVS) have been recently explored to tackle scenarios where conventional sensors result in high data rate and processing time. This paper presents a hybrid event-frame approach for detecting and tracking objects recorded by a stationary neuromorphic sensor, thereby exploiting the sparse NVS output in a low-power setting for traffic monitoring. Specifically, we propose a hardware efficient processing pipeline that optimizes memory and computational needs. The usage of NVS gives the advantage of rejecting background while it has a unique disadvantage of fragmented objects. To exploit the background removal, we propose an event-based binary image creation that signals presence or absence of events in a frame duration. This reduces memory requirement and enables usage of simple algorithms like median filtering and connected component labeling for denoise and region proposal respectively. To overcome the fragmentation issue, a YOLO-inspired neural network based detector and classifier to merge fragmented region proposals has been proposed. Finally, an overlap based tracker exploiting overlap between detections and tracks is proposed with heuristics to overcome occlusion. The proposed pipeline is evaluated with more than 5 hours of traffic recording spanning three different locations on two different NVS and demonstrate similar performance. Compared to existing event-based feature trackers, our method provides similar accuracy while needing 6 times less computes. To the best of our knowledge, this is the first time a stationary NVS based traffic monitoring solution is extensively compared to simultaneously recorded RGB frame-methods while showing tremendous promise by outperforming state-of-the-art deep learning solutions.
View on arXiv