Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.06186
Cited By
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
10 January 2025
Omkar Thawakar
Dinura Dissanayake
Ketan More
Ritesh Thawkar
Ahmed Heakl
Noor Ahsan
Yuhao Li
Mohammed Zumri
Jean Lahoud
Rao Muhammad Anwer
Hisham Cholakkal
Ivan Laptev
Mubarak Shah
Fahad Shahbaz Khan
Salman Khan
VLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs"
37 / 37 papers shown
Title
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
Lai Wei
Yuting Li
Kaipeng Zheng
Chen Wang
Yue Wang
Linghe Kong
Lichao Sun
Weiran Huang
OffRL
ReLM
LRM
20
0
0
28 May 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLM
LRM
VLM
40
0
0
28 May 2025
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
Muxi Diao
Lele Yang
Hongbo Yin
Zhexu Wang
Yejie Wang
Daxin Tian
Kongming Liang
Zhanyu Ma
VLM
LRM
30
0
0
27 May 2025
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
Chao Huang
Benfeng Wang
Jie Wen
Chengliang Liu
Wei Wang
Li Shen
Xiaochun Cao
LRM
31
0
0
26 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
112
0
0
25 May 2025
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Sara Ghaboura
Ketan More
Wafa Alghallabi
Omkar Thawakar
Jorma T. Laaksonen
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
VLM
LRM
38
0
0
22 May 2025
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fanrui Zhang
Dian Li
Qiang Zhang
Chenjun
sinbadliu
Junxiong Lin
Jiahong Yan
Jiawei Liu
Zheng-Jun Zha
OffRL
31
0
0
22 May 2025
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Chengwei Wei
Bin Wang
Jung-jae Kim
Nancy F. Chen
AuLLM
ReLM
LRM
27
0
0
21 May 2025
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
Kangan Qian
Sicong Jiang
Yang Zhong
Ziang Luo
Zilin Huang
...
Guang Li
Guang Chen
Hao Ye
Lijun Sun
Diange Yang
LRM
53
1
0
21 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
74
0
0
18 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TS
LRM
49
0
0
16 May 2025
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
Yuan-Hong Liao
Sven Elflein
Liu He
Laura Leal-Taixe
Yejin Choi
Sanja Fidler
David Acuna
ReLM
LRM
VLM
369
1
0
21 Apr 2025
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning
Baining Zhao
Ziyi Wang
Jianjie Fang
Chen Gao
Fanhang Man
Jinqiang Cui
Xin Wang
Xinlei Chen
Yong Li
Wenwu Zhu
LM&Ro
VLM
LRM
85
5
0
17 Apr 2025
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
Yikun Wang
Siyin Wang
Qinyuan Cheng
Zhaoye Fei
Liang Ding
Qipeng Guo
Dacheng Tao
Xipeng Qiu
LRM
37
2
0
12 Apr 2025
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
Yukun Qi
Yiming Zhao
Y. Zeng
Xikun Bao
Wenjie Huang
Lin Yen-Chen
Zehui Chen
Jie Zhao
Zhongang Qi
Feng Zhao
LRM
87
3
0
10 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Xinze Wang
Zhiyong Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
LRM
VLM
122
12
0
10 Apr 2025
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang
Long Chan
Jinlong Liu
Wanggui He
Hao Jiang
Mingli Song
Jingyuan Chen
Chang Yao
Jie Song
LRM
39
1
0
31 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
115
0
0
29 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Weinan Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
113
7
0
27 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
124
35
0
27 Mar 2025
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti
Massimiliano Mancini
Enrico Fini
Yiming Wang
Paolo Rota
Elisa Ricci
VLM
Presented at
ResearchTrend Connect | VLM
on
07 May 2025
125
1
0
27 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
148
0
0
26 Mar 2025
Towards Online Multi-Modal Social Interaction Understanding
Xuzhao Li
Shijian Deng
Bolin Lai
Weiguo Pian
James M. Rehg
Yapeng Tian
100
0
0
25 Mar 2025
Video-T1: Test-Time Scaling for Video Generation
Fan Liu
Hanyang Wang
Yimo Cai
Kaiyan Zhang
Xiaohang Zhan
Yueqi Duan
DiffM
VGen
113
5
0
24 Mar 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Zhiyu Lin
Yifei Gao
Xian Zhao
Yunfan Yang
Jitao Sang
LRM
87
5
0
23 Mar 2025
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Philip S. Yu
Xuming Hu
Qingsong Wen
278
7
0
23 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
110
45
0
17 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Yang Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
295
3
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yize Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
114
23
0
16 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
Junlong Li
Xiang Yue
Bo Li
Ping Nie
Dayou Du
Wenhu Chen
LRM
110
3
0
13 Mar 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang
Xiaoxuan He
Hongkun Pan
Xiyan Jiang
Yan Deng
...
Dacheng Yin
Fengyun Rao
Minfeng Zhu
Bo Zhang
Wei Chen
VLM
LRM
89
52
1
13 Mar 2025
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Ketan More
Omkar Thawakar
Ritesh Thawkar
...
Fahad Shahbaz Khan
Hisham Cholakkal
Ivan Laptev
Rao Muhammad Anwer
Salman Khan
LRM
86
3
0
13 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
93
85
0
09 Mar 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Kaixin Cai
...
Yu-Jie Yuan
Jiawei Han
Lanqing Hong
Hang Xu
Xiaodan Liang
ReLM
LRM
120
9
0
08 Mar 2025
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Hengguang Zhou
Xirui Li
Ruochen Wang
Minhao Cheng
Tianyi Zhou
Cho-Jui Hsieh
OffRL
LRM
ReLM
108
43
0
07 Mar 2025
Boosting Multimodal Reasoning with Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
Jianhua Tao
LRM
130
11
0
04 Feb 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Xinzhe Ni
Zicheng Lin
...
Yiyao Yu
C. Shi
Ruihang Chu
Jin Zeng
Yujiu Yang
LRM
94
17
0
08 Jan 2025
1