17
118

Attacking the Madry Defense Model with L1L_1-based Adversarial Examples

Abstract

The Madry Lab recently hosted a competition designed to test the robustness of their adversarially trained MNIST model. Attacks were constrained to perturb each pixel of the input image by a scaled maximal LL_\infty distortion ϵ\epsilon = 0.3. This discourages the use of attacks which are not optimized on the LL_\infty distortion metric. Our experimental results demonstrate that by relaxing the LL_\infty constraint of the competition, the elastic-net attack to deep neural networks (EAD) can generate transferable adversarial examples which, despite their high average LL_\infty distortion, have minimal visual distortion. These results call into question the use of LL_\infty as a sole measure for visual distortion, and further demonstrate the power of EAD at generating robust adversarial examples.

View on arXiv
Comments on this paper