ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12081
23
0

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning

17 May 2025
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
    VLM
    LRM
ArXivPDFHTML
Abstract

Large vision-language models exhibit inherent capabilities to handle diverse visual perception tasks. In this paper, we introduce VisionReasoner, a unified framework capable of reasoning and solving multiple visual perception tasks within a shared model. Specifically, by designing novel multi-object cognitive learning strategies and systematic task reformulation, VisionReasoner enhances its reasoning capabilities to analyze visual inputs, and addresses diverse perception tasks in a unified framework. The model generates a structured reasoning process before delivering the desired outputs responding to user queries. To rigorously assess unified visual perception capabilities, we evaluate VisionReasoner on ten diverse tasks spanning three critical domains: detection, segmentation, and counting. Experimental results show that VisionReasoner achieves superior performance as a unified model, outperforming Qwen2.5VL by relative margins of 29.1% on COCO (detection), 22.1% on ReasonSeg (segmentation), and 15.3% on CountBench (counting).

View on arXiv
@article{liu2025_2505.12081,
  title={ VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning },
  author={ Yuqi Liu and Tianyuan Qu and Zhisheng Zhong and Bohao Peng and Shu Liu and Bei Yu and Jiaya Jia },
  journal={arXiv preprint arXiv:2505.12081},
  year={ 2025 }
}
Comments on this paper