MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

21 February 2026

Changlu Guo

Anders Nymark Christensen

Anders Bjorholm Dahl

Morten Rieger Hannemose

DiffM

VGen

ArXiv (abs)PDF HTML Github

Main:8 Pages

7 Figures

Bibliography:3 Pages

4 Tables

Abstract

Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, and effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, achieves over 30x faster inference than the baseline method and achieves comparable or state-of-the-art performance across five benchmark datasets spanning diverse visual domains, establishing a practical and generalizable solution for efficient counterfactual explanation.

View on arXiv

Comments on this paper