97
2

Adversarial Robustness: What fools you makes you stronger

Abstract

We prove an exponential separation for the sample complexity between the standard PAC-learning model and a version of the Equivalence-Query-learning model. We then show that this separation has interesting implications for adversarial robustness. We explore a vision of designing an adaptive defense that in the presence of an attacker computes a model that is provably robust. In particular, we show how to realize this vision in a simplified setting. In order to do so, we introduce a notion of a strong adversary: he is not limited by the type of perturbations he can apply but when presented with a classifier can repetitively generate different adversarial examples. We explain why this notion is interesting to study and use it to prove the following. There exists an efficient adversarial-learning-like scheme such that for every strong adversary A\mathbf{A} it outputs a classifier that (a) cannot be strongly attacked by A\mathbf{A}, or (b) has error at most ϵ\epsilon. In both cases our scheme uses exponentially (in ϵ\epsilon) fewer samples than what the PAC bound requires.

View on arXiv
Comments on this paper