Z-Erase: Enabling Concept Erasure in Single-Stream Diffusion Transformers

27 March 2026

Nanxiang Jiang

Zhaoxin Fan

Baisen Wang

Daiheng Gao

Junhang Cheng

Jifeng Guo

Yalan Qin

Yeying Jin

Hongwei Zheng

Faguo Wu

Wenjun Wu

ArXiv (abs)PDF HTML Github

Main:8 Pages

18 Figures

Bibliography:3 Pages

11 Tables

Appendix:17 Pages

Abstract

Concept erasure serves as a vital safety mechanism for removing unwanted concepts from text-to-image (T2I) models. While extensively studied in U-Net and dual-stream architectures (e.g., Flux), this task remains under-explored in the recent emerging paradigm of single-stream diffusion transformers (e.g., Z-Image). In this new paradigm, text and image tokens are processed as a single unified sequence via shared parameters. Consequently, directly applying prior erasure methods typically leads to generation collapse. To bridge this gap, we introduce Z-Erase, the first concept erasure method tailored for single-stream T2I models. To guarantee stable image generation, Z-Erase first proposes a Stream Disentangled Concept Erasure Framework that decouples updates and enables existing methods on single-stream models. Subsequently, within this framework, we introduce Lagrangian-Guided Adaptive Erasure Modulation, a constrained algorithm that further balances the sensitive erasure-preservation trade-off. Moreover, we provide a rigorous convergence analysis proving that Z-Erase can converge to a Pareto stationary point. Experiments demonstrate that Z-Erase successfully overcomes the generation collapse issue, achieving state-of-the-art performance across a wide range of tasks.

View on arXiv

Comments on this paper