Robust DNN Watermarking via Fixed Embedding Weights with Optimized Distribution

Watermarking has been proposed as a way to protect the Intellectual Property Rights of Deep Neural Networks and track their use. Several methods have been proposed to embed the watermark into the trainable parameters of the network (white box watermarking) or into the input-output mapping implemented by the network in correspondence to specific inputs (black box watermarking). In both cases, achieving robustness against fine tuning, model compression and, even more, transfer learning, is one of the most difficult challenges researchers are facing with. In this paper, we propose a new white-box, multi-bit watermarking algorithm with strong robustness properties, including robustness against retraining for transfer learning. Robustness is achieved thanks to a new embedding strategy according to which the watermark message is spread across a number of fixed weights, whose position depends on a secret key. The weights hosting the watermark are set prior to training, and are left unchanged throughout the training procedure. The distribution of the weights carrying the watermark is theoretically optimised to make sure that they are indistinguishable from the non-watermarked weights, while at the same time setting their amplitude to as large as possible values to improve robustness against retraining. We carried out several experiments demonstrating the capability of the proposed scheme to provide high payloads with no significant impact on network accuracy, at the same time ensuring excellent robustness against network modifications an re-use, including retraining and transfer learning.
View on arXiv