Robust Physical Hard-Label Attacks on Deep Learning Visual Classification

European Symposium on Security and Privacy (EuroS&P), 2020

17 February 2020

Abstract

Existing automated attack-generation algorithms for machine learning models for computer vision tend to focus on digital attacks within an $\epsilon$ -ball around the input in white-box and black-box settings. However, in the real world, a more interesting class of attacks are those that are physically-realizable -- say, by placing unobtrusive stickers on a traffic sign to cause a change in its classification. Given a model, generating such attacks automatically is still a challenge, even in white-box settings. We present GRAPHITE, an algorithm to automatically find small areas to place robust adversarial perturbations in the black-box hard-label setting where the attacker only has access to the model prediction class label. Unlike algorithms for digital attacks that only aim to minimize perturbation based on an $L_p$ norm (typically $L_2$ or $L_\infty$ ), the proposed algorithm automatically generates robust adversarial examples that (1) have a high success rate under multiple transformations that simulate, for example, viewing point changes and (2) occupy a small area on the image so that they are more likely to be physically realizable as stickers. Using GRAPHITE, we successfully attack a stop sign to be misclassified as a speed limit 30 km/hr sign in 92.86% of physical test images with fewer than 124k queries.

View on arXiv

Comments on this paper