ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11049
21
0

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

16 May 2025
Yang Liu
Shengfang Zhai
Mingzhe Du
Yulin Chen
Tri Cao
Hongcheng Gao
Cheng Wang
Xuzhao Li
Kun Wang
Junfeng Fang
Jiaheng Zhang
Bryan Hooi
    OffRL
    LRM
ArXivPDFHTML
Abstract

To enhance the safety of VLMs, this paper introduces a novel reasoning-based VLM guard model dubbed GuardReasoner-VL. The core idea is to incentivize the guard model to deliberatively reason before making moderation decisions via online RL. First, we construct GuardReasoner-VLTrain, a reasoning corpus with 123K samples and 631K reasoning steps, spanning text, image, and text-image inputs. Then, based on it, we cold-start our model's reasoning ability via SFT. In addition, we further enhance reasoning regarding moderation through online RL. Concretely, to enhance diversity and difficulty of samples, we conduct rejection sampling followed by data augmentation via the proposed safety-aware data concatenation. Besides, we use a dynamic clipping parameter to encourage exploration in early stages and exploitation in later stages. To balance performance and token efficiency, we design a length-aware safety reward that integrates accuracy, format, and token cost. Extensive experiments demonstrate the superiority of our model. Remarkably, it surpasses the runner-up by 19.27% F1 score on average. We release data, code, and models (3B/7B) of GuardReasoner-VL atthis https URL

View on arXiv
@article{liu2025_2505.11049,
  title={ GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning },
  author={ Yue Liu and Shengfang Zhai and Mingzhe Du and Yulin Chen and Tri Cao and Hongcheng Gao and Cheng Wang and Xinfeng Li and Kun Wang and Junfeng Fang and Jiaheng Zhang and Bryan Hooi },
  journal={arXiv preprint arXiv:2505.11049},
  year={ 2025 }
}
Comments on this paper