Mitigating Adversarial Attacks by Distributing Different Copies to Different Users

30 November 2021

Abstract

Machine learning models are vulnerable to adversarial attacks. In this paper, we consider the scenario where a model is to be distributed to many users, among which a malicious user attempts to attack another user. The malicious user probes its unique copy of the model to search for adversarial samples, presenting found samples to the victim's model in order to replicate the attack. By distributing different copies of the model to different users, we can mitigate such attacks wherein adversarial samples found on one copy would not work on another copy. We propose a flexible parameter rewriting method that directly modifies the model's parameters. This method does not require training and is able to generate a large number of copies, where each copy induces different sets of adversarial samples. Experimentation studies show that our approach can significantly mitigate the attacks while retaining high accuracy.

View on arXiv

Comments on this paper