55
0

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

Main:9 Pages
22 Figures
Bibliography:3 Pages
5 Tables
Appendix:9 Pages
Abstract

Video virtual try-on aims to seamlessly replace the clothing of a person in a source video with a target garment. Despite significant progress in this field, existing approaches still struggle to maintain continuity and reproduce garment details. In this paper, we introduce ChronoTailor, a diffusion-based framework that generates temporally consistent videos while preserving fine-grained garment details. By employing a precise spatio-temporal attention mechanism to guide the integration of fine-grained garment features, ChronoTailor achieves robust try-on performance. First, ChronoTailor leverages region-aware spatial guidance to steer the evolution of spatial attention and employs an attention-driven temporal feature fusion mechanism to generate more continuous temporal features. This dual approach not only enables fine-grained local editing but also effectively mitigates artifacts arising from video dynamics. Second, ChronoTailor integrates multi-scale garment features to preserve low-level visual details and incorporates a garment-pose feature alignment to ensure temporal continuity during dynamic motion. Additionally, we collect StyleDress, a new dataset featuring intricate garments, varied environments, and diverse poses, offering advantages over existing public datasets, and will be publicly available for research. Extensive experiments show that ChronoTailor maintains spatio-temporal continuity and preserves garment details during motion, significantly outperforming previous methods.

View on arXiv
@article{wang2025_2506.05858,
  title={ ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On },
  author={ Jinjuan Wang and Wenzhang Sun and Ming Li and Yun Zheng and Fanyao Li and Zhulin Tao and Donglin Di and Hao Li and Wei Chen and Xianglin Huang },
  journal={arXiv preprint arXiv:2506.05858},
  year={ 2025 }
}
Comments on this paper