39

MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Changlu Guo
Anders Nymark Christensen
Anders Bjorholm Dahl
Morten Rieger Hannemose
Main:8 Pages
7 Figures
Bibliography:3 Pages
4 Tables
Abstract

Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, and effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, achieves over 30x faster inference than the baseline method and achieves comparable or state-of-the-art performance across five benchmark datasets spanning diverse visual domains, establishing a practical and generalizable solution for efficient counterfactual explanation.

View on arXiv
Comments on this paper