Evaluating Inexact Unlearning Requires Revisiting Forgetting

17 January 2022

Shashwat Goel

Christian Schroeder de Witt

Amartya Sanyal

AAML

ELM

ArXiv (abs)PDF HTML Github (12★)

Main:10 Pages

11 Figures

Bibliography:5 Pages

8 Tables

Appendix:15 Pages

Abstract

Existing methods in inexact unlearning are evaluated by measuring indistinguishability from models retrained after removing the deletion set. We argue that achieving indistinguishability is unnecessary and its practical relaxations are insufficient. We formulate the goal of unlearning as forgetting all information specific to the deletion set while maintaining high utility and resource efficiency. We introduce a novel test for forgetting called Interclass Confusion (IC). Despite being a black-box test, IC can investigate whether information from the deletion set was erased until the early layers of the network. We analyze two aspects of forgetting: (i) memorization and (ii) property generalization. We empirically show that two simple unlearning methods, exact-unlearning and catastrophic-forgetting the final k layers of a network, outperforms prior unlearning methods when scaled to large deletion sets. Overall, we believe our formulation and the IC test will guide the design of better unlearning algorithms.

View on arXiv

Comments on this paper