Explaining YOLO: Leveraging Grad-CAM to Explain Object Detections

Abstract
We investigate the problem of explainability for visual object detectors. Specifically, we demonstrate on the example of the YOLO object detector how to integrate Grad-CAM into the model architecture and analyze the results. We show how to compute attribution-based explanations for individual detections and find that the normalization of the results has a great impact on their interpretation.
View on arXivComments on this paper