17
1

Multi-attacks: Many images ++ the same adversarial attack \to many target labels

Abstract

We show that we can easily design a single adversarial perturbation PP that changes the class of nn images X1,X2,,XnX_1,X_2,\dots,X_n from their original, unperturbed classes c1,c2,,cnc_1, c_2,\dots,c_n to desired (not necessarily all the same) classes c1,c2,,cnc^*_1,c^*_2,\dots,c^*_n for up to hundreds of images and target classes at once. We call these \textit{multi-attacks}. Characterizing the maximum nn we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around 10O(100)10^{\mathcal{O}(100)}, posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.

View on arXiv
Comments on this paper