AT-GAN: A Generative Attack Model for Adversarial Transferring on Generative Adversarial Nets

16 April 2019

Abstract

We propose a new attack model called AT-GAN that transfers generative adversarial nets (GANs) for adversarial attacks. Different from existing attacks that add small perturbations to the input images, AT-GAN tries to explore the distribution of adversarial instances so as to directly generate the adversarial examples from any random noise. In this way, the generated adversarial examples are not limited to any natual images. Also, compared with the ioneer work using GAN that do iterations of gradient descent to search for a good noise in the neighborhood of an original random noise such that the corresponding output of GAN is an adversial example, our model is a generative model that tries to learn the distribution of the adversial examples. Once AT-GAN is trained, it can generate adversarial images and the output is not limited to the input noise. Experiments show that AT-GAN is very fast and can generate plenty of adversarial instances that look more realistic to human eyes, AT-GAN yields a higher attack success rate under various adversarial training defenses for semi-whitebox as well as black-box attack settings, and AT-GAN can learn a distribution of adversarial examples that is very close to the distribution of the real data.

View on arXiv

Comments on this paper