Understanding Frank-Wolfe Adversarial Training
- AAML

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense against such attacks. While projected gradient descent (PGD) has received most attention for approximately solving the inner maximization of AT, Frank-Wolfe (FW) optimization is projection-free and can be adapted to any norm. A Frank-Wolfe adversarial training approach is presented and is shown to provide as competitive level of robustness as PGD-AT for a variety of architectures, attacks, and datasets. Exploiting a representation of the FW attack we are able to derive the geometric insight that: The larger the norm of an attack is, the less loss gradient variation there is. It is then experimentally demonstrated that attacks against robust models achieve near the maximal possible distortion, providing a new lens into the specific type of regularization that AT bestows. Using FW optimization in conjunction with robust models, we are able to generate sparse human-interpretable counterfactual explanations without relying on expensive projections.
View on arXiv