DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment

29 May 2025

Main:8 Pages

16 Figures

Bibliography:3 Pages

6 Tables

Appendix:7 Pages

Abstract

A long-held challenge in no-reference image quality assessment (NR-IQA) learning from human subjective perception is the lack of objective generalization to unseen natural distortions. To address this, we integrate a novel Depth-Guided cross-attention and refinement (Depth-CAR) mechanism, which distills scene depth and spatial features into a structure-aware representation for improved NR-IQA. This brings in the knowledge of object saliency and relative contrast of the scene for more discriminative feature learning. Additionally, we introduce the idea of TCB (Transformer-CNN Bridge) to fuse high-level global contextual dependencies from a transformer backbone with local spatial features captured by a set of hierarchical CNN (convolutional neural network) layers. We implement TCB and Depth-CAR as multimodal attention-based projection functions to select the most informative features, which also improve training time and inference efficiency. Experimental results demonstrate that our proposed DGIQA model achieves state-of-the-art (SOTA) performance on both synthetic and authentic benchmark datasets. More importantly, DGIQA outperforms SOTA models on cross-dataset evaluations as well as in assessing natural image distortions such as low-light effects, hazy conditions, and lens flares.

View on arXiv

@article{ramesh2025_2505.24002,
  title={ DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment },
  author={ Vaishnav Ramesh and Junliang Liu and Haining Wang and Md Jahidul Islam },
  journal={arXiv preprint arXiv:2505.24002},
  year={ 2025 }
}

Comments on this paper