Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.07934
Cited By
v1
v2
v3 (latest)
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
10 April 2025
Xinze Wang
Zhiyong Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
LRM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement"
50 / 93 papers shown
Title
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
90
0
0
30 May 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLM
LRM
VLM
91
0
0
28 May 2025
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
Lai Wei
Yuting Li
Kaipeng Zheng
Chen Wang
Yue Wang
Linghe Kong
Lichao Sun
Weiran Huang
OffRL
ReLM
LRM
67
1
0
28 May 2025
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni
Zhengyuan Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
W. Zuo
Lijuan Wang
ReLM
LRM
79
1
0
26 May 2025
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
Yunxin Li
Xinyu Chen
Zitao Li
Zhenyu Liu
L. Wang
Wenhan Luo
Baotian Hu
Min Zhang
OffRL
LRM
121
0
0
25 May 2025
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Haoyuan Sun
Jiaqi Wu
Bo Xia
Yifu Luo
Yifei Zhao
Kai Qin
Xufei Lv
Tiantian Zhang
Yongzhe Chang
Xueqian Wang
OffRL
LRM
200
0
0
24 May 2025
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
Huanjin Yao
Qixiang Yin
Jingyi Zhang
Min Yang
Yibo Wang
...
Fei Su
Li Shen
Minghui Qiu
Dacheng Tao
Jiaxing Huang
LRM
72
0
0
22 May 2025
DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data
Yuhang Zhou
Jing Zhu
Shengyi Qian
Zhuokai Zhao
Xiyao Wang
Xiaoyu Liu
Ming Li
Paiheng Xu
Wei Ai
Furong Huang
87
1
0
21 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
172
8
0
30 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
436
16
0
17 Apr 2025
Towards Visual Text Grounding of Multimodal Large Language Model
Ming Li
Ruiyi Zhang
Jian Chen
Jiuxiang Gu
Yufan Zhou
Franck Dernoncourt
Wanrong Zhu
Dinesh Manocha
Tong Sun
96
3
0
07 Apr 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
200
213
0
18 Mar 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang
Xiaoxuan He
Hongkun Pan
Xiyan Jiang
Yan Deng
...
Dacheng Yin
Fengyun Rao
Minfeng Zhu
Bo Zhang
Wei Chen
VLM
LRM
126
98
1
13 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
187
87
0
10 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
144
130
0
09 Mar 2025
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Michael Tschannen
A. Gritsenko
Xiao Wang
Muhammad Ferjad Naeem
Ibrahim Alabdulmohsin
...
Basil Mustafa
Olivier J. Hénaff
Jeremiah Harmsen
Andreas Steiner
Xiaohua Zhai
VLM
132
79
0
21 Feb 2025
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
327
685
0
20 Feb 2025
LIMO: Less is More for Reasoning
Yixin Ye
Zhen Huang
Yang Xiao
Ethan Chern
Shijie Xia
Pengfei Liu
LRM
164
165
0
05 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
377
1,967
0
22 Jan 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Omkar Thawakar
Dinura Dissanayake
Ketan More
Ritesh Thawkar
Ahmed Heakl
...
Hisham Cholakkal
Ivan Laptev
Mubarak Shah
Fahad Shahbaz Khan
Salman Khan
VLM
LRM
112
57
0
10 Jan 2025
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
Yunzhuo Hao
Jiawei Gu
Huichen Will Wang
Linjie Li
Zhiyong Yang
Lijuan Wang
Yu Cheng
LRM
85
36
0
10 Jan 2025
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Qifan Yu
Zhebei Shen
Zhongqi Yue
Yang Wu
Wenqiao Zhang
Yunfei Li
Juncheng Li
Siliang Tang
Yueting Zhuang
71
2
0
09 Dec 2024
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
Jarvis Guo
Tuney Zheng
Yuelin Bai
Bo Li
Yubo Wang
King Zhu
Yizhi Li
Graham Neubig
Wenhu Chen
Xiang Yue
LRM
147
47
0
06 Dec 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
204
1,019
0
25 Oct 2024
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Xiyao Wang
Linfeng Song
Ye Tian
Dian Yu
Baolin Peng
Haitao Mi
Furong Huang
Dong Yu
LRM
114
14
0
09 Oct 2024
LLaVA-Critic: Learning to Evaluate Multimodal Models
Tianyi Xiong
Xinze Wang
Dong Guo
Qinghao Ye
Haoqi Fan
Quanquan Gu
Heng Huang
Chunyuan Li
MLLM
VLM
LRM
116
53
0
03 Oct 2024
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Zitian Gao
Boye Niu
Xuzheng He
Haotian Xu
Hongzhang Liu
Aiwei Liu
Xuming Hu
Lijie Wen
LRM
137
42
0
02 Oct 2024
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin
Zhaochun Ren
Junxiao Song
Zhihong Shao
Wanjia Zhao
...
Dejian Yang
Zhibin Gou
Z. F. Wu
Fuli Luo
Chong Ruan
AIMat
LRM
107
70
0
15 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
117
860
0
06 Aug 2024
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linfeng Ren
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
Xinchao Wang
VLM
MLLM
101
25
0
01 Aug 2024
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
151
125
0
26 Jun 2024
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Guilherme Penedo
Hynek Kydlícek
Loubna Ben Allal
Anton Lozhkov
Margaret Mitchell
Colin Raffel
Leandro von Werra
Thomas Wolf
119
259
0
25 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
116
377
0
24 Jun 2024
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu
Weijia Shi
Xingyu Fu
Dan Roth
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Ranjay Krishna
LRM
81
70
0
13 Jun 2024
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Liangchen Luo
Yinxiao Liu
Rosanne Liu
Samrat Phatale
Harsh Lara
...
Lei Shu
Yun Zhu
Lei Meng
Jiao Sun
Abhinav Rastogi
LRM
93
188
0
05 Jun 2024
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Zou
Kai-Wei Chang
Wei Wang
SyDa
VLM
LRM
82
46
0
30 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Dinesh Manocha
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
144
38
0
24 May 2024
Calibrated Self-Rewarding Vision Language Models
Yiyang Zhou
Zhiyuan Fan
Dongjie Cheng
Sihan Yang
Zhaorun Chen
Chenhang Cui
Xiyao Wang
Yun Li
Linjun Zhang
Huaxiu Yao
VLM
113
34
0
23 May 2024
AlphaMath Almost Zero: process Supervision without process
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
AIMat
LRM
62
112
0
06 May 2024
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Yuxi Xie
Anirudh Goyal
Wenyue Zheng
Min-Yen Kan
Timothy Lillicrap
Kenji Kawaguchi
Michael Shieh
ReLM
LRM
109
126
0
01 May 2024
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
Bozhi Luan
Hao Feng
Hong Chen
Yonghui Wang
Wen-gang Zhou
Houqiang Li
MLLM
94
17
0
15 Apr 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Jiaqi Wang
Yu Qiao
Dahua Lin
Feng Zhao
VLM
116
302
0
29 Mar 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao
Shengju Qian
Han Xiao
Guanglu Song
Zhuofan Zong
Letian Wang
Yu Liu
Hongsheng Li
VGen
LRM
MLLM
108
75
0
25 Mar 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Peng Gao
Hongsheng Li
71
250
0
21 Mar 2024
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
Ke Wang
Junting Pan
Weikang Shi
Zimu Lu
Mingjie Zhan
Hongsheng Li
84
187
0
22 Feb 2024
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
Ruibo Chen
Yihan Wu
Lichang Chen
Guodong Liu
Qi He
Tianyi Xiong
Chenxi Liu
Junfeng Guo
Heng-Chiao Huang
VLM
48
21
0
19 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
144
1,274
0
05 Feb 2024
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
Ming Li
Yong Zhang
Shwai He
Zhitao Li
Hongyu Zhao
Jianzong Wang
Ning Cheng
Dinesh Manocha
92
79
0
01 Feb 2024
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
Xiyao Wang
Yuhang Zhou
Xiaoyu Liu
Hongjin Lu
Yuancheng Xu
...
Taixi Lu
Gedas Bertasius
Mohit Bansal
Huaxiu Yao
Furong Huang
LRM
VLM
136
77
0
19 Jan 2024
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Peiyi Wang
Lei Li
Zhihong Shao
R. X. Xu
Damai Dai
Yifei Li
Deli Chen
Y.Wu
Zhifang Sui
AIMat
LRM
ALM
141
395
0
14 Dec 2023
1
2
Next