Understanding Frank-Wolfe Adversarial Training

22 December 2020

Abstract

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense against such attacks. We develop a theoretical framework for adversarial training with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the distortion of $\ell_\infty$ FW attacks (the attack's $\ell_2$ norm). Specifically, we show that high distortion of FW attacks is equivalent to low variation along the attack path. It is then experimentally demonstrated on various deep neural network architectures that $\ell_\infty$ attacks against robust models achieve near maximal $\ell_2$ distortion. This mathematical transparency differentiates FW from the more popular Projected Gradient Descent (PGD) optimization. To demonstrate the utility of our theoretical framework we develop FW-Adapt, a novel adversarial training algorithm which uses simple distortion measure to adaptively change number of attack steps during training. FW-Adapt provides strong robustness at lower training times in comparison to PGD-AT for a variety of white-box and black-box attacks.

View on arXiv

Comments on this paper