22
4

Adversarial Quantum Machine Learning: An Information-Theoretic Generalization Analysis

Abstract

In a manner analogous to their classical counterparts, quantum classifiers are vulnerable to adversarial attacks that perturb their inputs. A promising countermeasure is to train the quantum classifier by adopting an attack-aware, or adversarial, loss function. This paper studies the generalization properties of quantum classifiers that are adversarially trained against bounded-norm white-box attacks. Specifically, a quantum adversary maximizes the classifier's loss by transforming an input state ρ(x)\rho(x) into a state λ\lambda that is ϵ\epsilon-close to the original state ρ(x)\rho(x) in pp-Schatten distance. Under suitable assumptions on the quantum embedding ρ(x)\rho(x), we derive novel information-theoretic upper bounds on the generalization error of adversarially trained quantum classifiers for p=1p = 1 and p=p = \infty. The derived upper bounds consist of two terms: the first is an exponential function of the 2-R\ényi mutual information between classical data and quantum embedding, while the second term scales linearly with the adversarial perturbation size ϵ\epsilon. Both terms are shown to decrease as 1/T1/\sqrt{T} over the training set size TT . An extension is also considered in which the adversary assumed during training has different parameters pp and ϵ\epsilon as compared to the adversary affecting the test inputs. Finally, we validate our theoretical findings with numerical experiments for a synthetic setting.

View on arXiv
Comments on this paper