A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models

17 February 2025

Abstract

Text-to-Image (T2I) models have made remarkable progress in generating high-quality, diverse visual content from natural language prompts. However, their ability to reproduce copyrighted styles, sensitive imagery, and harmful content raises significant ethical and legal concerns. Concept erasure offers a proactive alternative to external filtering by modifying T2I models to prevent the generation of undesired content. In this survey, we provide a structured overview of concept erasure, categorizing existing methods based on their optimization strategies and the architectural components they modify. We categorize concept erasure methods into fine-tuning for parameter updates, closed-form solutions for efficient edits, and inference-time interventions for content restriction without weight modification. Additionally, we explore adversarial attacks that bypass erasure techniques and discuss emerging defenses. To support further research, we consolidate key datasets, evaluation metrics, and benchmarks for assessing erasure effectiveness and model robustness. This survey serves as a comprehensive resource, offering insights into the evolving landscape of concept erasure, its challenges, and future directions.

View on arXiv

@article{kim2025_2502.14896,
  title={ A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models },
  author={ Changhoon Kim and Yanjun Qi },
  journal={arXiv preprint arXiv:2502.14896},
  year={ 2025 }
}

Comments on this paper