Adversarial Quantum Machine Learning: An Information-Theoretic Generalization Analysis

In a manner analogous to their classical counterparts, quantum classifiers are vulnerable to adversarial attacks that perturb their inputs. A promising countermeasure is to train the quantum classifier by adopting an attack-aware, or adversarial, loss function. This paper studies the generalization properties of quantum classifiers that are adversarially trained against bounded-norm white-box attacks. Specifically, a quantum adversary maximizes the classifier's loss by transforming an input state into a state that is -close to the original state in -Schatten distance. Under suitable assumptions on the quantum embedding , we derive novel information-theoretic upper bounds on the generalization error of adversarially trained quantum classifiers for and . The derived upper bounds consist of two terms: the first is an exponential function of the 2-R\ényi mutual information between classical data and quantum embedding, while the second term scales linearly with the adversarial perturbation size . Both terms are shown to decrease as over the training set size . An extension is also considered in which the adversary assumed during training has different parameters and as compared to the adversary affecting the test inputs. Finally, we validate our theoretical findings with numerical experiments for a synthetic setting.
View on arXiv