Adversarial attacks on deep neural networks traditionally rely on a constrained optimization paradigm, where an optimization procedure is used to obtain a single adversarial perturbation for a given input example. In this work we frame the problem as learning a distribution of adversarial perturbations, enabling us to generate diverse adversarial distributions given an unperturbed input. We show that this framework is domain-agnostic in that the same framework can be employed to attack different input domains with minimal modification. Across three diverse domains---images, text, and graphs---our approach generates whitebox attacks with success rates that are competitive with or superior to existing approaches, with a new state-of-the-art achieved in the graph domain. Finally, we demonstrate that our framework can efficiently generate a diverse set of attacks for a single given input, and is even capable of attacking \textit{unseen} test instances in a zero-shot manner, exhibiting attack generalization.
View on arXiv