Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.20330
Cited By
Are We on the Right Way for Evaluating Large Vision-Language Models?
29 March 2024
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Zehui Chen
Haodong Duan
Jiaqi Wang
Yu Qiao
Dahua Lin
Feng Zhao
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Are We on the Right Way for Evaluating Large Vision-Language Models?"
50 / 189 papers shown
Title
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
Han Bao
Yue Huang
Yanbo Wang
Jiayi Ye
Xiangqi Wang
Xiuying Chen
Mohamed Elhoseiny
Xuzhi Zhang
Mohamed Elhoseiny
Xiangliang Zhang
47
7
0
28 Oct 2024
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
Fengbin Zhu
Ziyang Liu
Xiang Yao Ng
Haohui Wu
Luu Anh Tuan
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
VLM
35
3
0
25 Oct 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Haodong Duan
Conghui He
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
34
7
0
23 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Conghui He
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
48
26
0
22 Oct 2024
Mitigating Object Hallucination via Concentric Causal Attention
Yun Xing
Yiheng Li
Ivan Laptev
Shijian Lu
47
18
0
21 Oct 2024
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
...
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
Jiankang Deng
MLLM
VLM
45
1
0
18 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
71
21
0
18 Oct 2024
Can MLLMs Understand the Deep Implication Behind Chinese Images?
Chenhao Zhang
Xi Feng
Yuelin Bai
Xinrun Du
Jinchang Hou
...
Min Yang
Wenhao Huang
Chenghua Lin
Ge Zhang
Shiwen Ni
ELM
VLM
38
4
0
17 Oct 2024
H2OVL-Mississippi Vision Language Models Technical Report
Shaikat Galib
Shanshan Wang
Guanshuo Xu
Pascal Pfeiffer
Ryan Chesler
Mark Landry
Sri Satish Ambati
MLLM
VLM
18
2
0
17 Oct 2024
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Fengji Zhang
Linquan Wu
Huiyu Bai
Guancheng Lin
Xiao Li
Xiao Yu
Yue Wang
Bei Chen
Jacky Keung
MLLM
ELM
LRM
32
0
0
16 Oct 2024
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Botian Jiang
Lei Li
Xiaonan Li
Zhaowei Li
Xiachong Feng
Lingpeng Kong
Qiang Liu
Xipeng Qiu
41
2
0
16 Oct 2024
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
55
5
0
14 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
54
10
0
14 Oct 2024
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua
Yunlong Tang
Ziyun Zeng
Liangliang Cao
Zhengyuan Yang
Hangfeng He
Chenliang Xu
Jiebo Luo
VLM
CoGe
44
9
0
13 Oct 2024
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Yue Yang
S. Zhang
Wenqi Shao
Kaipeng Zhang
Yi Bin
Yu Wang
Ping Luo
30
3
0
11 Oct 2024
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
Yuqing Nie
Chong Wang
Kaidi Wang
Guoai Xu
Guosheng Xu
Haoyu Wang
OffRL
136
1
0
11 Oct 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Qidong Huang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Jiaqi Wang
Dahua Lin
Weiming Zhang
Nenghai Yu
57
5
0
09 Oct 2024
Intriguing Properties of Large Language and Vision Models
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Yechan Hwang
Ho-Jin Choi
LRM
VLM
43
0
0
07 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
84
26
0
04 Oct 2024
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
Kenza Amara
Lukas Klein
Carsten T. Lüth
Paul Jäger
Hendrik Strobelt
Mennatallah El-Assady
30
1
0
02 Oct 2024
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark
Elliot L. Epstein
Kaisheng Yao
Jing Li
Xinyi Bai
Hamid Palangi
LRM
47
0
0
26 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
36
16
0
26 Sep 2024
Phantom of Latent for Large Language and Vision Models
Byung-Kwan Lee
Sangyun Chung
Chae Won Kim
Beomchan Park
Yong Man Ro
VLM
LRM
41
6
0
23 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
51
11
0
23 Sep 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
42
9
0
21 Sep 2024
SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
Jiashuo Sun
Jihai Zhang
Yucheng Zhou
Zhaochen Su
Xiaoye Qu
Yu Cheng
48
12
0
21 Sep 2024
Constructive Apraxia: An Unexpected Limit of Instructible Vision-Language Models and Analog for Human Cognitive Disorders
David A. Noever
S. M. Noever
27
0
0
17 Sep 2024
POINTS: Improving Your Vision-language Model with Affordable Strategies
Yuan Liu
Zhongyin Zhao
Ziyuan Zhuang
Le Tian
Xiao Zhou
Jie Zhou
VLM
43
5
0
07 Sep 2024
Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
Chia-Yu Hsu
Wenwen Li
Sizhe Wang
42
12
0
31 Aug 2024
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
50
88
0
29 Aug 2024
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
50
20
0
28 Aug 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
L. Wang
Rong Jin
Tieniu Tan
OffRL
69
36
0
23 Aug 2024
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
42
61
0
22 Aug 2024
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Jonathan Roberts
Kai Han
Samuel Albanie
33
3
0
21 Aug 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
44
90
0
16 Aug 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Yunhang Shen
Meng Zhao
...
Ran He
Rongrong Ji
Yunsheng Wu
Caifeng Shan
Xing Sun
MLLM
44
80
0
09 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
58
554
0
06 Aug 2024
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Yang Wang
Naomi Saphra
39
10
0
22 Jul 2024
TraveLLM: Could you plan my new public transit route in face of a network disruption?
Bowen Fang
Zixiao Yang
Shukai Wang
Xuan Di
16
3
0
20 Jul 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
97
76
0
17 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
39
115
0
16 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
42
14
0
08 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
54
5
0
06 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
45
100
0
03 Jul 2024
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao
Qiuna Tan
Guanting Dong
Minhui Wu
Chong Sun
...
Yida Xu
Muxi Diao
Zhimin Bao
Chen Li
Honggang Zhang
VLM
LRM
44
31
0
01 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
108
13
0
01 Jul 2024
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang
Liang Chen
Taian Guo
Fu Zeng
Yusheng Zhao
...
Wei Ju
Luchen Liu
Tianyu Liu
Baobao Chang
Ming Zhang
43
5
0
29 Jun 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Tao Zhang
Xiangtai Li
Hao Fei
Haobo Yuan
Shengqiong Wu
Shunping Ji
Chen Change Loy
Shuicheng Yan
LRM
MLLM
VLM
49
48
0
27 Jun 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Xiangyu Zhao
Xiangtai Li
Haodong Duan
Haian Huang
Yining Li
Kai Chen
Hua Yang
VLM
MLLM
45
10
0
25 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
48
281
0
24 Jun 2024
Previous
1
2
3
4
Next