ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.15963
11
0

OViP: Online Vision-Language Preference Learning

21 May 2025
Shujun Liu
Siyuan Wang
Zejun Li
Jianxiang Wang
Cheng Zeng
Zhongyu Wei
    MLLM
    VLM
ArXivPDFHTML
Abstract

Large vision-language models (LVLMs) remain vulnerable to hallucination, often generating content misaligned with visual inputs. While recent approaches advance multi-modal Direct Preference Optimization (DPO) to mitigate hallucination, they typically rely on predefined or randomly edited negative samples that fail to reflect actual model errors, limiting training efficacy. In this work, we propose an Online Vision-language Preference Learning (OViP) framework that dynamically constructs contrastive training data based on the model's own hallucinated outputs. By identifying semantic differences between sampled response pairs and synthesizing negative images using a diffusion model, OViP generates more relevant supervision signals in real time. This failure-driven training enables adaptive alignment of both textual and visual preferences. Moreover, we refine existing evaluation protocols to better capture the trade-off between hallucination suppression and expressiveness. Experiments on hallucination and general benchmarks demonstrate that OViP effectively reduces hallucinations while preserving core multi-modal capabilities.

View on arXiv
@article{liu2025_2505.15963,
  title={ OViP: Online Vision-Language Preference Learning },
  author={ Shujun Liu and Siyuan Wang and Zejun Li and Jianxiang Wang and Cheng Zeng and Zhongyu Wei },
  journal={arXiv preprint arXiv:2505.15963},
  year={ 2025 }
}
Comments on this paper