ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.14404
  4. Cited By
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
v1v2 (latest)

ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations

20 May 2025
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
    LRM
ArXiv (abs)PDFHTML

Papers citing "ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations"

49 / 49 papers shown
Title
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
158
89
1
14 Apr 2025
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Wulin Xie
Yize Zhang
Chaoyou Fu
Yang Shi
Bingyan Nie
Hongkai Chen
Zheng Zhang
Liang Wang
Tieniu Tan
90
3
0
04 Apr 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Xiangyu Zhao
Peiyuan Zhang
Kexian Tang
Hao Li
Zicheng Zhang
...
Guangtao Zhai
Junchi Yan
Hua Yang
Xue Yang
Haodong Duan
VLMLRM
123
5
0
03 Apr 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Weinan Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&RoLRM
163
7
0
27 Mar 2025
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Kexian Tang
Junyao Gao
Yanhong Zeng
Haodong Duan
Yanan Sun
Zhening Xing
Wenran Liu
Kaifeng Lyu
Kai-xiang Chen
ELMLRM
122
8
0
25 Mar 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Zhiyu Lin
Yifei Gao
Xian Zhao
Yunfan Yang
Jitao Sang
LRM
134
5
0
23 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yize Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
165
27
0
16 Mar 2025
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity
Jing Bi
Junjia Guo
Susan Liang
Guangyu Sun
Luchuan Song
...
Jinxi He
Jiarui Wu
Ali Vosoughi
Chong Chen
Chenliang Xu
LRM
102
7
0
14 Mar 2025
CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Guanghao Zhang
Tao Zhong
Yan Xia
Zhelun Yu
Haoyang Li
...
Fangxun Shu
Mushui Liu
D. She
Yi Wang
Hao Jiang
LRM
71
1
0
07 Mar 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
319
546
0
20 Feb 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanwei Li
Yu Qi
...
Shen Yan
Bo Zhang
Chaoyou Fu
Peng Gao
Hongsheng Li
MLLMLRM
119
32
0
13 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
373
1,692
0
22 Jan 2025
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Hao Fei
Shengqiong Wu
Wei Ji
Hao Zhang
Hao Fei
Mong Li Lee
Wynne Hsu
LRMVGen
116
78
0
08 Jan 2025
Why Do Large Language Models (LLMs) Struggle to Count Letters?
Why Do Large Language Models (LLMs) Struggle to Count Letters?
Tairan Fu
Raquel Ferrando
Javier Conde
Carlos Arriaga
Pedro Reviriego
58
10
0
19 Dec 2024
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng
Qiguang Chen
Jin Zhang
Hao Fei
Xiaocheng Feng
Wanxiang Che
Min Li
L. Qin
VLMMLLMLRM
161
7
0
17 Dec 2024
MageBench: Bridging Large Multimodal Models to Agents
MageBench: Bridging Large Multimodal Models to Agents
Miaosen Zhang
Qi Dai
Yifan Yang
Jianmin Bao
Dongdong Chen
Kai Qiu
Chong Luo
Xin Geng
B. Guo
LRMLLMAG
88
3
0
05 Dec 2024
Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
Qingbin Liu
Yumeng Li
Boyuan Xiao
Yichang Jian
Ziang Qin
Tianjia Shao
Yao-Xiang Ding
Kun Zhou
LRMMLLM
157
4
0
27 Nov 2024
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
Nan Xu
Xuezhe Ma
LRM
128
5
0
18 Oct 2024
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
Haojun Shi
Suyu Ye
Xinyu Fang
Chuanyang Jin
Leyla Isik
Yen-Ling Kuo
Tianmin Shu
LLMAG
113
14
0
22 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLMSyDaVLM
115
783
0
06 Aug 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MAVLM
142
166
0
16 Jul 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
111
610
0
18 Jun 2024
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal
  Language Models
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu
Weijia Shi
Xingyu Fu
Dan Roth
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Ranjay Krishna
LRM
81
65
0
13 Jun 2024
M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal
  Chain-of-Thought
M3^33CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought
Qiguang Chen
Libo Qin
Jin Zhang
Zhi Chen
Xiao Xu
Wanxiang Che
LRM
98
60
0
26 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
186
309
0
16 May 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
113
168
0
01 Apr 2024
A Review of Multi-Modal Large Language and Vision Models
A Review of Multi-Modal Large Language and Vision Models
Kilian Carolan
Laura Fennelly
Alan F. Smeaton
VLM
158
28
0
28 Mar 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
250
1,126
0
21 Dec 2023
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
Yijun Yang
Tianyi Zhou
Kanxue Li
Dapeng Tao
Lusong Li
Li Shen
Xiaodong He
Jing Jiang
Yuhui Shi
LLMAGLM&Ro
62
43
0
28 Nov 2023
Multimodal Large Language Models: A Survey
Multimodal Large Language Models: A Survey
Jiayang Wu
Wensheng Gan
Zefeng Chen
Shicheng Wan
Philip S. Yu
83
187
0
22 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
112
283
0
21 Nov 2023
Holistic Evaluation of Text-To-Image Models
Holistic Evaluation of Text-To-Image Models
Tony Lee
Michihiro Yasunaga
Chenlin Meng
Yifan Mai
Joon Sung Park
...
Jun-Yan Zhu
Fei-Fei Li
Jiajun Wu
Stefano Ermon
Percy Liang
207
138
0
07 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLMMLLM
93
502
0
06 Nov 2023
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative
  Vokens
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
Kaizhi Zheng
Xuehai He
Xin Eric Wang
MLLM
102
96
0
03 Oct 2023
A Configurable Library for Generating and Manipulating Maze Datasets
A Configurable Library for Generating and Manipulating Maze Datasets
Michael Ivanitskiy
Rusheb Shah
Alex F Spies
Tilman Rauker
Dan Valentine
...
Lucia Quirke
Chris Mathwin
Guillaume Corlouer
Cecilia G. Diniz Behn
Samy Wu Fung Colorado School of Mines
99
11
0
19 Sep 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
101
595
0
12 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
128
1,686
0
06 Jul 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
107
755
0
26 Jun 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text
  Documents
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
122
242
0
21 Jun 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
119
2,067
0
11 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
156
2,033
0
20 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
539
4,861
0
17 Apr 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
109
1,648
0
06 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,247
0
27 Feb 2023
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
392
3,542
0
29 Apr 2022
ChartQA: A Benchmark for Question Answering about Charts with Visual and
  Logical Reasoning
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Ahmed Masry
Do Xuan Long
J. Tan
Shafiq Joty
Enamul Hoque
AIMat
132
660
0
19 Mar 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
458
15,665
0
20 Dec 2021
JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method
JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method
Vishwanath A. Sindagi
R. Yasarla
Vishal M. Patel
73
227
0
07 Apr 2020
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
413
43,667
0
01 May 2014
1