Breaking Dataset Boundaries: Class-Agnostic Targeted Adversarial Attacks

27 May 2025

Abstract

We present Cross-Domain Multi-Targeted Attack (CD-MTA), a method for generating adversarial examples that mislead image classifiers toward any target class, including those not seen during training. Traditional targeted attacks are limited to one class per model, requiring expensive retraining for each target. Multi-targeted attacks address this by introducing a perturbation generator with a conditional input to specify the target class. However, existing methods are constrained to classes observed during training and require access to the black-box model's training data--introducing a form of data leakage that undermines realistic evaluation in practical black-box scenarios. We identify overreliance on class embeddings as a key limitation, leading to overfitting and poor generalization to unseen classes. To address this, CD-MTA replaces class-level supervision with an image-based conditional input and introduces class-agnostic losses that align the perturbed and target images in the feature space. This design removes dependence on class semantics, thereby enabling generalization to unseen classes across datasets. Experiments on ImageNet and seven other datasets show that CD-MTA outperforms prior multi-targeted attacks in both standard and cross-domain settings--without accessing the black-box model's training data.

View on arXiv

@article{gonçalves2025_2505.20782,
  title={ Breaking Dataset Boundaries: Class-Agnostic Targeted Adversarial Attacks },
  author={ Taïga Gonçalves and Tomo Miyazaki and Shinichiro Omachi },
  journal={arXiv preprint arXiv:2505.20782},
  year={ 2025 }
}

Comments on this paper