40
25

Towards Feature Space Adversarial Attack

Abstract

We propose a new adversarial attack to Deep Neural Networks for image classification. Different from most existing attacks that directly perturb input pixels, our attack focuses on perturbing abstract features, more specifically, features that denote styles, including interpretable styles such as vivid colors and sharp outlines, and uninterpretable ones. It induces model misclassfication by injecting imperceptible style changes through an optimization procedure. We show that our attack can generate adversarial samples that are more natural-looking than the state-of-the-art unbounded attacks. The experiment also supports that existing pixel-space adversarial attack detection and defense techniques can hardly ensure robustness in the style related feature space.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.