GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion

17 June 2025

Main:13 Pages

12 Figures

Bibliography:3 Pages

9 Tables

Abstract

In the field of image fusion, promising progress has been made by modeling data from different modalities as linear subspaces.However, in practice, the source images are often located in a non-Euclidean space, where the Euclidean methods usually cannotencapsulate the intrinsic topological structure. Typically, the inner product performed in the Euclidean space calculates the algebraicsimilarity rather than the semantic similarity, which results in undesired attention output and a decrease in fusion performance.While the balance of low-level details and high-level semantics should be considered in infrared and visible image fusion task. Toaddress this issue, in this paper, we propose a novel attention mechanism based on Grassmann manifold for infrared and visibleimage fusion (GrFormer). Specifically, our method constructs a low-rank subspace mapping through projection constraints on theGrassmann manifold, compressing attention features into subspaces of varying rank levels. This forces the features to decouple intohigh-frequency details (local low-rank) and low-frequency semantics (global low-rank), thereby achieving multi-scale semanticfusion. Additionally, to effectively integrate the significant information, we develop a cross-modal fusion strategy (CMS) based ona covariance mask to maximise the complementary properties between different modalities and to suppress the features with highcorrelation, which are deemed redundant. The experimental results demonstrate that our network outperforms SOTA methods bothqualitatively and quantitatively on multiple image fusion benchmarks. The codes are available atthis https URL.

View on arXiv

@article{kang2025_2506.14384,
  title={ GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion },
  author={ Huan Kang and Hui Li and Xiao-Jun Wu and Tianyang Xu and Rui Wang and Chunyang Cheng and Josef Kittler },
  journal={arXiv preprint arXiv:2506.14384},
  year={ 2025 }
}

Comments on this paper