ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.06805
  4. Cited By
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

10 January 2024
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
    LRM
ArXivPDFHTML

Papers citing "Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning"

50 / 61 papers shown
Title
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
J. A. Zhang
Chuanqi Cheng
Y. Liu
W. Liu
Jian Luan
Rui Yan
27
1
0
28 Apr 2025
Fast-Slow Thinking for Large Vision-Language Model Reasoning
Fast-Slow Thinking for Large Vision-Language Model Reasoning
W. L. Xiao
Leilei Gan
Weilong Dai
Wanggui He
Ziwei Huang
...
Fangxun Shu
Zhelun Yu
Peng Zhang
Hao Jiang
Fei Wu
ReLM
LRM
AI4CE
155
1
0
25 Apr 2025
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Zhikai Wang
Jiashuo Sun
W. Zhang
Zhiqiang Hu
Xin Li
F. Wang
Deli Zhao
VLM
LRM
75
0
0
24 Apr 2025
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving
Zeyu Zhang
Z. Chen
Zicheng Zhang
Yuze Sun
Yuan Tian
Ziheng Jia
Chunyi Li
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
MLLM
36
0
0
15 Apr 2025
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Hyunwoo Oh
Yezi Liu
Tamoghno Das
Mohsen Imani
VLM
LRM
34
0
0
15 Apr 2025
Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation
Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation
Z. Li
Jinzhi Deng
Haibing Ma
Chi Zhang
Dan Xiao
25
0
0
31 Mar 2025
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Li Lyna Zhang
Longxi Gao
Mengwei Xu
LRM
37
0
0
21 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
44
1
0
19 Mar 2025
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Z. Wang
Yurui Dong
Fuwen Luo
Minyuan Ruan
Zhili Cheng
C. L. P. Chen
Peng Li
Yang Liu
LRM
87
0
0
13 Mar 2025
Interpretable and Robust Dialogue State Tracking via Natural Language Summarization with LLMs
Rafael Carranza
Mateo Alejandro Rojas
57
0
0
11 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
88
31
0
10 Mar 2025
Fine-Grained Retrieval-Augmented Generation for Visual Question Answering
Fine-Grained Retrieval-Augmented Generation for Visual Question Answering
Zhengxuan Zhang
Yin Wu
Yuyu Luo
Nan Tang
35
0
0
28 Feb 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
194
0
0
18 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
121
9
0
05 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
68
8
0
04 Feb 2025
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
H. Malik
Fahad Shamshad
Muzammal Naseer
Karthik Nandakumar
F. Khan
Salman Khan
AAML
MLLM
VLM
68
0
0
03 Feb 2025
Position: Empowering Time Series Reasoning with Multimodal LLMs
Position: Empowering Time Series Reasoning with Multimodal LLMs
Yaxuan Kong
Yiyuan Yang
Shiyu Wang
Chenghao Liu
Yuxuan Liang
Ming Jin
Stefan Zohren
Dan Pei
Y. Liu
Qingsong Wen
AI4TS
LRM
78
2
0
03 Feb 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
88
11
0
06 Jan 2025
The BrowserGym Ecosystem for Web Agent Research
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier De Chezelles
Maxime Gasse
Alexandre Lacoste
Alexandre Drouin
Massimo Caccia
...
Siva Reddy
Quentin Cappart
Graham Neubig
Ruslan Salakhutdinov
Nicolas Chapados
LLMAG
101
9
0
06 Dec 2024
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language
  Model Benchmarking
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking
Harsha Vardhan Khurdula
Basem Rizk
Indus Khaitan
Janit Anjaria
Aviral Srivastava
Rajvardhan Khaitan
ELM
VLM
LRM
69
0
0
20 Nov 2024
See it, Think it, Sorted: Large Multimodal Models are Few-shot Time
  Series Anomaly Analyzers
See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers
Jiaxin Zhuang
Leon Yan
Zhenwei Zhang
Ruiqi Wang
Jiawei Zhang
Yuantao Gu
AI4TS
29
7
0
04 Nov 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst
Tim Nelson Tobiasch
Lukas Helff
Inga Ibs
Wolfgang Stammer
D. Dhami
Constantin Rothkopf
Kristian Kersting
CoGe
ReLM
VLM
LRM
63
1
0
25 Oct 2024
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving
  Robust Reasoning in Large Language Models via Abstraction
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
Kaiqiao Han
Tianqing Fang
Zhaowei Wang
Y. Song
Mark Steedman
LRM
32
4
0
15 Oct 2024
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal
  Large Language Models Via Error Detection
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Hang Li
B. Li
...
Kun Wang
Hui Xiong
Philip S. Yu
Xuming Hu
Qingsong Wen
LRM
33
13
0
06 Oct 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
  Mathematical Reasoning
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRM
AI4CE
23
13
0
19 Sep 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual
  Instruction Tuning
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
M. Zhang
Xunliang Cai
LRM
35
6
0
21 Aug 2024
CXSimulator: A User Behavior Simulation using LLM Embeddings for
  Web-Marketing Campaign Assessment
CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment
Akira Kasuga
Ryo Yonetani
31
1
0
31 Jul 2024
CoDefeater: Using LLMs To Find Defeaters in Assurance Cases
CoDefeater: Using LLMs To Find Defeaters in Assurance Cases
Usman Gohar
Michael C. Hunter
Robyn R. Lutz
Myra B. Cohen
OffRL
29
6
0
18 Jul 2024
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Daoyuan Chen
Haibin Wang
Yilun Huang
Ce Ge
Yaliang Li
Bolin Ding
Jingren Zhou
VLM
SyDa
61
0
0
16 Jul 2024
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya
Agney S Talwarr
Vatsal Gupta
Tushar Kataria
Dan Roth
Vivek Gupta
LRM
61
2
0
15 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
57
5
0
11 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
54
0
0
04 Jul 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via
  Data Synthesis
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian-Yu Guan
Wei Wu
Rui Yan
LRM
37
10
0
28 Jun 2024
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Bahare Fatemi
Mehran Kazemi
Anton Tsitsulin
Karishma Malkan
Jinyeong Yim
John Palowitch
Sungyong Seo
Jonathan J. Halcrow
Bryan Perozzi
LRM
35
26
0
13 Jun 2024
What do MLLMs hear? Examining reasoning with text and sound components
  in Multimodal Large Language Models
What do MLLMs hear? Examining reasoning with text and sound components in Multimodal Large Language Models
Enis Berk Çoban
Michael I. Mandel
Johanna Devaney
AuLLM
LRM
38
0
0
07 Jun 2024
Evaluating Vision-Language Models on Bistable Images
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
41
0
0
29 May 2024
GameVLM: A Decision-making Framework for Robotic Task Planning Based on
  Visual Language Models and Zero-sum Games
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games
Aoran Mei
Jianhua Wang
Guo-Niu Zhu
Zhongxue Gan
34
6
0
22 May 2024
ALCM: Autonomous LLM-Augmented Causal Discovery Framework
ALCM: Autonomous LLM-Augmented Causal Discovery Framework
Elahe Khatibi
Mahyar Abbasian
Zhongqi Yang
Iman Azimi
Amir M. Rahmani
64
12
0
02 May 2024
PerkwE_COQA: Enhanced Persian Conversational Question Answering by
  combining contextual keyword extraction with Large Language Models
PerkwE_COQA: Enhanced Persian Conversational Question Answering by combining contextual keyword extraction with Large Language Models
Pardis Moradbeiki
Nasser Ghadiri
37
0
0
08 Apr 2024
Facial Affective Behavior Analysis with Instruction Tuning
Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li
Anh Dao
Wentao Bao
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
CVBM
53
15
0
07 Apr 2024
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive
  Speech Detection via Large Language Models
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
H. Nghiem
Hal Daumé
31
1
0
18 Mar 2024
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language
  Models
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Lizhou Fan
Wenyue Hua
Xiang Li
Kaijie Zhu
Mingyu Jin
...
Haoyang Ling
Jinkui Chi
Jindong Wang
Xin Ma
Yongfeng Zhang
LRM
37
14
0
04 Mar 2024
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Haogeng Liu
Quanzeng You
Xiaotian Han
Yiqi Wang
Bohan Zhai
Yongfei Liu
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
MLLM
44
9
0
03 Mar 2024
Explaining latent representations of generative models with large
  multimodal models
Explaining latent representations of generative models with large multimodal models
Mengdan Zhu
Zhenke Liu
Bo Pan
Abhinav Angirekula
Liang Zhao
32
2
0
02 Feb 2024
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form
  Egocentric Videos
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
41
15
0
07 Dec 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
159
579
0
06 Apr 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
152
391
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,229
0
30 Jan 2023
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
  Chain-of-Thought
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
121
275
0
03 Oct 2022
12
Next