Robust Physical Hard-Label Attacks on Deep Learning Visual
Classification
- OODAAML
Existing automated attack-generation algorithms for machine learning models for computer vision tend to focus on digital attacks within an -ball around the input in white-box and black-box settings. However, in the real world, a more interesting class of attacks are those that are physically-realizable -- say, by placing unobtrusive stickers on a traffic sign to cause a change in its classification. Given a model, generating such attacks automatically is still a challenge, even in white-box settings. We present GRAPHITE, an algorithm to automatically find small areas to place robust adversarial perturbations in the black-box hard-label setting where the attacker only has access to the model prediction class label. Unlike algorithms for digital attacks that only aim to minimize perturbation based on an norm (typically or ), the proposed algorithm automatically generates robust adversarial examples that (1) have a high success rate under multiple transformations that simulate, for example, viewing point changes and (2) occupy a small area on the image so that they are more likely to be physically realizable as stickers. Using GRAPHITE, we successfully attack a stop sign to be misclassified as a speed limit 30 km/hr sign in 92.86% of physical test images with fewer than 124k queries.
View on arXiv