39

LEMON: Local Explanations via Modality-aware OptimizatioN

Yu Qin
Phillip Sloan
Raul Santos-Rodriguez
Majid Mirmehdi
Telmo de Menezes e Silva Filho
Main:8 Pages
6 Figures
Bibliography:2 Pages
4 Tables
Appendix:4 Pages
Abstract

Multimodal models are ubiquitous, yet existing explainability methods are often single-modal, architecture-dependent, or too computationally expensive to run at scale. We introduce LEMON (Local Explanations via Modality-aware OptimizatioN), a model-agnostic framework for local explanations of multimodal predictions. LEMON fits a single modality-aware surrogate with group-structured sparsity to produce unified explanations that disentangle modality-level contributions and feature-level attributions. The approach treats the predictor as a black box and is computationally efficient, requiring relatively few forward passes while remaining faithful under repeated perturbations. We evaluate LEMON on vision-language question answering and a clinical prediction task with image, text, and tabular inputs, comparing against representative multimodal baselines. Across backbones, LEMON achieves competitive deletion-based faithfulness while reducing black-box evaluations by 35-67 times and runtime by 2-8 times compared to strong multimodal baselines.

View on arXiv
Comments on this paper