Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning

26 May 2025

Papers citing "Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning"

6 / 56 papers shown

Title
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 694 28,659 0 26 Feb 2021
PlotQA: Reasoning over Scientific Plots Nitesh Methani Pritha Ganguly Mitesh M. Khapra Pratyush Kumar 84 216 0 03 Sep 2019
Proximal Policy Optimization Algorithms John Schulman Filip Wolski Prafulla Dhariwal Alec Radford Oleg Klimov OffRL 245 18,685 0 20 Jul 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Yash Goyal Tejas Khot D. Summers-Stay Dhruv Batra Devi Parikh CoGe 297 3,187 0 02 Dec 2016
Asynchronous Methods for Deep Reinforcement Learning Volodymyr Mnih Adria Puigdomenech Badia M. Berk Mirza Alex Graves Timothy Lillicrap Tim Harley David Silver Koray Kavukcuoglu 168 8,805 0 04 Feb 2016
Microsoft COCO Captions: Data Collection and Evaluation Server Xinlei Chen Hao Fang Nayeon Lee Ramakrishna Vedantam Saurabh Gupta Piotr Dollar C. L. Zitnick 161 2,461 0 01 Apr 2015