The fusion of sensor data is essential for a robust perception of the environment in autonomous driving. Learning-based fusion approaches mainly use feature-level fusion to achieve high performance, but their complexity and hardware requirements limit their applicability in near-production vehicles. High-level fusion methods offer robustness with lower computational requirements. Traditional methods, such as the Kalman filter, dominate this area. This paper modifies the Adapted Kalman Filter (AKF) and proposes a novel transformer-based high-level object fusion method called HiLO. Experimental results demonstrate improvements of percentage points in score and percentage points in mean IoU. Evaluation on a new large-scale real-world dataset demonstrates the effectiveness of the proposed approaches. Their generalizability is further validated by cross-domain evaluation between urban and highway scenarios. Code, data, and models are available atthis https URL.
View on arXiv@article{osterburg2025_2506.02554, title={ HiLO: High-Level Object Fusion for Autonomous Driving using Transformers }, author={ Timo Osterburg and Franz Albers and Christopher Diehl and Rajesh Pushparaj and Torsten Bertram }, journal={arXiv preprint arXiv:2506.02554}, year={ 2025 } }