One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution

18 June 2025

Yujing Sun

Lingchen Sun

Shuaizheng Liu

Rongyuan Wu

Zhengqiang Zhang

Lei Zhang

Author Contacts:

yujinsun@polyu.edu.hk ling-chen.sun@connect.polyu.hk shuaizheng.liu@connect.polyu.hk rong-yuan.wu@connect.polyu.hk zhengqiang.zhang@connect.polyu.hk cslzhang@comp.polyu.edu.hk

DiffM

VGen

ArXiv (abs)PDF HTML

Main:15 Pages

9 Figures

Bibliography:4 Pages

3 Tables

Abstract

It is a challenging problem to reproduce rich spatial details while maintaining temporal consistency in real-world video super-resolution (Real-VSR), especially when we leverage pre-trained generative models such as stable diffusion (SD) for realistic details synthesis. Existing SD-based Real-VSR methods often compromise spatial details for temporal coherence, resulting in suboptimal visual quality. We argue that the key lies in how to effectively extract the degradation-robust temporal consistency priors from the low-quality (LQ) input video and enhance the video details while maintaining the extracted consistency priors. To achieve this, we propose a Dual LoRA Learning (DLoRAL) paradigm to train an effective SD-based one-step diffusion model, achieving realistic frame details and temporal consistency simultaneously. Specifically, we introduce a Cross-Frame Retrieval (CFR) module to aggregate complementary information across frames, and train a Consistency-LoRA (C-LoRA) to learn robust temporal representations from degraded inputs. After consistency learning, we fix the CFR and C-LoRA modules and train a Detail-LoRA (D-LoRA) to enhance spatial details while aligning with the temporal space defined by C-LoRA to keep temporal coherence. The two phases alternate iteratively for optimization, collaboratively delivering consistent and detail-rich outputs. During inference, the two LoRA branches are merged into the SD model, allowing efficient and high-quality video restoration in a single diffusion step. Experiments show that DLoRAL achieves strong performance in both accuracy and speed. Code and models are available atthis https URL.

View on arXiv

@article{sun2025_2506.15591,
  title={ One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution },
  author={ Yujing Sun and Lingchen Sun and Shuaizheng Liu and Rongyuan Wu and Zhengqiang Zhang and Lei Zhang },
  journal={arXiv preprint arXiv:2506.15591},
  year={ 2025 }
}

Comments on this paper