CLIP-aware Domain-Adaptive Super-Resolution

18 May 2025

Abstract

This work introduces CLIP-aware Domain-Adaptive Super-Resolution (CDASR), a novel framework that addresses the critical challenge of domain generalization in single image super-resolution. By leveraging the semantic capabilities of CLIP (Contrastive Language-Image Pre-training), CDASR achieves unprecedented performance across diverse domains and extreme scaling factors. The proposed method integrates CLIP-guided feature alignment mechanism with a meta-learning inspired few-shot adaptation strategy, enabling efficient knowledge transfer and rapid adaptation to target domains. A custom domain-adaptive module processes CLIP features alongside super-resolution features through a multi-stage transformation process, including CLIP feature processing, spatial feature generation, and feature fusion. This intricate process ensures effective incorporation of semantic information into the super-resolution pipeline. Additionally, CDASR employs a multi-component loss function that combines pixel-wise reconstruction, perceptual similarity, and semantic consistency. Extensive experiments on benchmark datasets demonstrate CDASR's superiority, particularly in challenging scenarios. On the Urban100 dataset at $\times$ 8 scaling, CDASR achieves a significant PSNR gain of 0.15dB over existing methods, with even larger improvements of up to 0.30dB observed at $\times$ 16 scaling.

View on arXiv

@article{lu2025_2505.12391,
  title={ CLIP-aware Domain-Adaptive Super-Resolution },
  author={ Zhengyang Lu and Qian Xia and Weifan Wang and Feng Wang },
  journal={arXiv preprint arXiv:2505.12391},
  year={ 2025 }
}

Comments on this paper