12
6

BUZz: BUffer Zones for defending adversarial examples in image classification

Abstract

We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We are able to achieve this improvement even when the adversary has access to the {\em entire} original training data set and unlimited query access to the defense. We verify our claim through experimentation using Fashion-MNIST and CIFAR-10: We demonstrate <11%<11\% attack success rate -- significantly lower than what other well-known state-of-the-art defenses offer -- at only a price of a 1118%11-18\% drop in clean accuracy. By using a new intuitive metric, we explain why this trade-off offers a significant improvement over prior work.

View on arXiv
Comments on this paper