26
0

SAIF: Sparse Adversarial and Imperceptible Attack Framework

Abstract

Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with O(1/T)O(1/\sqrt{T}) convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.

View on arXiv
@article{imtiaz2025_2212.07495,
  title={ SAIF: Sparse Adversarial and Imperceptible Attack Framework },
  author={ Tooba Imtiaz and Morgan Kohler and Jared Miller and Zifeng Wang and Masih Eskander and Mario Sznaier and Octavia Camps and Jennifer Dy },
  journal={arXiv preprint arXiv:2212.07495},
  year={ 2025 }
}
Comments on this paper