435

Evaluating Inexact Unlearning Requires Revisiting Forgetting

Main:10 Pages
11 Figures
Bibliography:5 Pages
8 Tables
Appendix:15 Pages
Abstract

Existing methods in inexact unlearning are evaluated by measuring indistinguishability from models retrained after removing the deletion set. We argue that achieving indistinguishability is unnecessary and its practical relaxations are insufficient. We formulate the goal of unlearning as forgetting all information specific to the deletion set while maintaining high utility and resource efficiency. We introduce a novel test for forgetting called Interclass Confusion (IC). Despite being a black-box test, IC can investigate whether information from the deletion set was erased until the early layers of the network. We analyze two aspects of forgetting: (i) memorization and (ii) property generalization. We empirically show that two simple unlearning methods, exact-unlearning and catastrophic-forgetting the final k layers of a network, outperforms prior unlearning methods when scaled to large deletion sets. Overall, we believe our formulation and the IC test will guide the design of better unlearning algorithms.

View on arXiv
Comments on this paper