ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.12742
  4. Cited By
Benchmarking Multi-Image Understanding in Vision and Language Models:
  Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

18 June 2024
Bingchen Zhao
Yongshuo Zong
Letian Zhang
Timothy Hospedales
    VLM
ArXiv (abs)PDFHTML

Papers citing "Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning"

13 / 13 papers shown
Title
VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism
VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism
Congzhi Zhang
Jiawei Peng
Zhenglin Wang
Yilong Lai
Haowen Sun
Heng Chang
Fei Ma
Weijiang Yu
ReLMLRM
16
0
0
10 Jun 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Jingyun Zhang
Chuanqi Cheng
Yang Liu
Wen Liu
Jian Luan
Rui Yan
70
4
0
28 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
221
132
1
14 Apr 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
108
3
0
18 Mar 2025
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
Seungwon Lim
Sungwoong Kim
Jihwan Yu
Sungjae Lee
Jiwan Chung
Youngjae Yu
131
1
0
18 Mar 2025
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models
Boyu Jia
Junzhe Zhang
Huixuan Zhang
Xiaojun Wan
LRM
109
2
0
03 Mar 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Ziyi Wang
Yang Zhou
ELMVLMLRM
120
1
0
27 Feb 2025
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Nilay Yilmaz
Maitreya Patel
Yiran Luo
Tejas Gokhale
Chitta Baral
Suren Jayasuriya
Yezhou Yang
LRM
106
0
0
25 Feb 2025
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Z. Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
106
7
0
02 Oct 2024
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
Gautier Dagan
Olga Loginova
Anil Batra
CoGe
135
1
0
17 Sep 2024
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image
  Relational Association Capabilities in Large Visual Language Models
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
Siwei Wu
Kang Zhu
Yu Bai
Yiming Liang
Yizhi Li
...
Xingwei Qu
Xuxin Cheng
Ge Zhang
Wenhao Huang
Chenghua Lin
VLM
91
2
0
24 Jul 2024
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data
  via Visual Prompting
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Hyungjun Yoon
Biniyam Aschalew Tolera
Taesik Gong
Kimin Lee
Sung-Ju Lee
94
8
0
15 Jul 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
59
3
0
26 May 2024
1