ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02095
11
0

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences

2 June 2025
Hyojin Bahng
Caroline Chan
F. Durand
Phillip Isola
    EGVM
ArXivPDFHTML
Abstract

Learning alignment between language and vision is a fundamental challenge, especially as multimodal data becomes increasingly detailed and complex. Existing methods often rely on collecting human or AI preferences, which can be costly and time-intensive. We propose an alternative approach that leverages cycle consistency as a supervisory signal. Given an image and generated text, we map the text back to image space using a text-to-image model and compute the similarity between the original image and its reconstruction. Analogously, for text-to-image generation, we measure the textual similarity between an input caption and its reconstruction through the cycle. We use the cycle consistency score to rank candidates and construct a preference dataset of 866K comparison pairs. The reward model trained on our dataset outperforms state-of-the-art alignment metrics on detailed captioning, with superior inference-time scalability when used as a verifier for Best-of-N sampling. Furthermore, performing DPO and Diffusion DPO using our dataset enhances performance across a wide range of vision-language tasks and text-to-image generation. Our dataset, model, and code are atthis https URL

View on arXiv
@article{bahng2025_2506.02095,
  title={ Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences },
  author={ Hyojin Bahng and Caroline Chan and Fredo Durand and Phillip Isola },
  journal={arXiv preprint arXiv:2506.02095},
  year={ 2025 }
}
Comments on this paper