We propose the scheme that mitigates the adversarial perturbation on the adversarial example ( , is a benign sample) by subtracting the estimated perturbation from and adding to . The estimated perturbation comes from the difference between and its moving-averaged outcome where is moving average kernel that all the coefficients are one. Usually, the adjacent samples of an image are close to each other such that we can let (naming this relation after X-MAS[X minus Moving Averaged Samples]). By doing that, we can make the estimated perturbation falls within the range of . The scheme is also extended to do the multi-level mitigation by configuring the mitigated adversarial example as a new adversarial example to be mitigated. The multi-level mitigation gets closer to with a smaller (i.e. mitigated) perturbation than original unmitigated perturbation by setting the moving averaged adversarial sample (which has the smaller perturbation than if ) as the boundary condition that the multi-level mitigation cannot cross over (i.e. decreasing cannot go below and increasing cannot go beyond). With the multi-level mitigation, we can get high prediction accuracies even in the adversarial example having a large perturbation (i.e. ). The proposed scheme is evaluated with adversarial examples crafted by the FGSM (Fast Gradient Sign Method) based attacks on ResNet-50 trained with ImageNet dataset.
View on arXiv