On Success and Simplicity: A Second Look at Transferable Targeted Attacks

Neural Information Processing Systems (NeurIPS), 2020

21 December 2020

ArXiv (abs)PDF HTML Github (51★)

Abstract

There is broad consensus among researchers studying adversarial examples that it is extremely difficult to achieve transferable targeted attacks. Currently, existing research strives for transferable targeted attacks by resorting to complex losses and even massive training. In this paper, we take a second look at transferable targeted attacks and show that their difficulty has been overestimated due to a blind spot in the conventional evaluation procedures. Specifically, current work has unreasonably restricted attack optimization to a few iterations. Here, we show that targeted attacks converge slowly to optimal transferability and improve considerably when given more iterations. We also demonstrate that an attack that simply maximizes the target logit performs surprisingly well, remarkably surpassing more complex losses and even achieving performance comparable to the state of the art, which requires massive training with a sophisticated multi-term loss. We provide further validation of our logit attack in a realistic ensemble setting and in a real-world attack against the Google Cloud Vision API. The logit attack produces perturbations that reflect the target semantics, which we demonstrate allows us to create targeted universal adversarial perturbations without additional training images.

View on arXiv

Comments on this paper