Radar hits reflect from points on both the boundary and internal to object outlines. This results in a complex distribution of radar hits that depends on factors including object category, size, and orientation. Current radar-camera fusion methods implicitly account for this with a black-box neural network. In this paper, we explicitly utilize a radar hit distribution model to assist fusion. First, we build a model to predict radar hit distributions conditioned on object properties obtained from a monocular detector. Second, we use the predicted distribution as a kernel to match actual measured radar points in the neighborhood of the monocular detections, generating matching scores at nearby positions. Finally, a fusion stage combines context with the kernel detector to refine the matching scores. Our method achieves the state-of-the-art radar-camera detection performance on nuScenes. Our source code is available atthis https URL.
View on arXiv@article{long2025_2504.09086, title={ RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection }, author={ Yunfei Long and Abhinav Kumar and Xiaoming Liu and Daniel Morris }, journal={arXiv preprint arXiv:2504.09086}, year={ 2025 } }