ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.13973
19
0

Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models

20 May 2025
Wenhui Zhu
Xuanzhao Dong
Xin Li
Peijie Qiu
Xiwen Chen
Abolfazl Razi
Aris Sotiras
Yi Su
Yalin Wang
    OffRL
    LM&MA
ArXivPDFHTML
Abstract

Recently, reinforcement learning (RL)-based tuning has shifted the trajectory of Multimodal Large Language Models (MLLMs), particularly following the introduction of Group Relative Policy Optimization (GRPO). However, directly applying it to medical tasks remains challenging for achieving clinically grounded model behavior. Motivated by the need to align model response with clinical expectations, we investigate four critical dimensions that affect the effectiveness of RL-based tuning in medical visual question answering (VQA): base model initialization strategy, the role of medical semantic alignment, the impact of length-based rewards on long-chain reasoning, and the influence of bias. We conduct extensive experiments to analyze these factors for medical MLLMs, providing new insights into how models are domain-specifically fine-tuned. Additionally, our results also demonstrate that GRPO-based RL tuning consistently outperforms standard supervised fine-tuning (SFT) in both accuracy and reasoning quality.

View on arXiv
@article{zhu2025_2505.13973,
  title={ Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models },
  author={ Wenhui Zhu and Xuanzhao Dong and Xin Li and Peijie Qiu and Xiwen Chen and Abolfazl Razi and Aris Sotiras and Yi Su and Yalin Wang },
  journal={arXiv preprint arXiv:2505.13973},
  year={ 2025 }
}
Comments on this paper