10
0

ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation

Jimyeong Kim
Jungwon Park
Yeji Song
Nojun Kwak
Wonjong Rhee
Main:8 Pages
31 Figures
Bibliography:2 Pages
2 Tables
Appendix:15 Pages
Abstract

Rectified Flow text-to-image models surpass diffusion models in image quality and text alignment, but adapting ReFlow for real-image editing remains challenging. We propose a new real-image editing method for ReFlow by analyzing the intermediate representations of multimodal transformer blocks and identifying three key features. To extract these features from real images with sufficient structural preservation, we leverage mid-step latent, which is inverted only up to the mid-step. We then adapt attention during injection to improve editability and enhance alignment to the target text. Our method is training-free, requires no user-provided mask, and can be applied even without a source prompt. Extensive experiments on two benchmarks with nine baselines demonstrate its superior performance over prior methods, further validated by human evaluations confirming a strong user preference for our approach.

View on arXiv
@article{kim2025_2507.01496,
  title={ ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation },
  author={ Jimyeong Kim and Jungwon Park and Yeji Song and Nojun Kwak and Wonjong Rhee },
  journal={arXiv preprint arXiv:2507.01496},
  year={ 2025 }
}
Comments on this paper