Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.16502
Cited By
v1
v2
v3
v4 (latest)
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
50 / 700 papers shown
Title
TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models
Junlong Jia
Ying Hu
Xi Weng
Yiming Shi
Miao Li
...
Baichuan Zhou
Ziyu Liu
Jie Luo
Lei Huang
Ji Wu
97
9
0
20 May 2024
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
119
58
0
17 May 2024
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
Jonathan Roberts
Kai Han
N. Houlsby
Samuel Albanie
91
16
0
14 May 2024
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Chengyue Wu
Yixiao Ge
Qiushan Guo
Jiahao Wang
Zhixuan Liang
Zeyu Lu
Ying Shan
Ping Luo
MLLM
VLM
78
1
0
13 May 2024
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Jiachen Li
Xinyao Wang
Sijie Zhu
Chia-Wen Kuo
Lu Xu
Fan Chen
Jitesh Jain
Humphrey Shi
Longyin Wen
MLLM
MoE
100
33
0
09 May 2024
MAmmoTH2: Scaling Instructions from the Web
Xiang Yue
Tuney Zheng
Ge Zhang
Wenhu Chen
ALM
LRM
100
101
0
06 May 2024
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Piotr Padlewski
Max Bain
Matthew Henderson
Zhongkai Zhu
Nishant Relan
...
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Mikel Artetxe
Yi Tay
VLM
152
27
0
03 May 2024
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
105
177
0
03 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max Ku
Qian Liu
Wenhu Chen
VLM
MLLM
118
125
0
02 May 2024
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
147
45
0
29 Apr 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
195
644
0
25 Apr 2024
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Bohao Li
Yuying Ge
Yi Chen
Yixiao Ge
Ruimao Zhang
Ying Shan
VLM
90
60
0
25 Apr 2024
What Makes Multimodal In-Context Learning Work?
Folco Bertini Baldassini
Mustafa Shukor
Matthieu Cord
Laure Soulier
Benjamin Piwowarski
138
23
0
24 Apr 2024
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Davide Caffagni
Federico Cocchi
Nicholas Moratelli
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
118
47
0
23 Apr 2024
DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models
Jieru Lin
Danqing Huang
Tiejun Zhao
Dechen Zhan
Chin-Yew Lin
VLM
MLLM
62
3
0
23 Apr 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
144
142
0
22 Apr 2024
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Zhuofan Zong
Bingqi Ma
Dazhong Shen
Guanglu Song
Hao Shao
Dongzhi Jiang
Hongsheng Li
Yu Liu
MoE
108
51
0
19 Apr 2024
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu
Yushi Hu
Bangzheng Li
Yu Feng
Haoyu Wang
Xudong Lin
Dan Roth
Noah A. Smith
Wei-Chiu Ma
Ranjay Krishna
VLM
LRM
MLLM
154
150
0
18 Apr 2024
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Aitor Ormazabal
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Deyu Fu
...
Yazheng Yang
Yi Tay
Yuqi Wang
Zhongkai Zhu
Zhihui Xie
LRM
VLM
ReLM
98
52
0
18 Apr 2024
Private Attribute Inference from Images with Vision-Language Models
Batuhan Tömekçe
Mark Vero
Robin Staab
Martin Vechev
VLM
PILM
94
10
0
16 Apr 2024
DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
Anna C. Doris
Daniele Grandi
Ryan Tomich
Md Ferdous Alam
Hyunmin Cheong
Faez Ahmed
71
20
0
11 Apr 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
87
23
0
10 Apr 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Xingcheng Zhang
Jifeng Dai
Yuxin Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
116
127
0
09 Apr 2024
Rethinking How to Evaluate Language Model Jailbreak
Hongyu Cai
Arjun Arunasalam
Leo Y. Lin
Antonio Bianchi
Z. Berkay Celik
ALM
65
8
0
09 Apr 2024
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
Junpeng Liu
Yifan Song
Bill Yuchen Lin
Wai Lam
Graham Neubig
Yuanzhi Li
Xiang Yue
VLM
134
49
0
09 Apr 2024
Visually Descriptive Language Model for Vector Graphics Reasoning
Zhenhailong Wang
Joy Hsu
Xingyao Wang
Kuan-Hao Huang
Manling Li
Jiajun Wu
Heng Ji
MLLM
VLM
LRM
58
4
0
09 Apr 2024
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification
Kai Sun
Yushi Bai
Ji Qi
Lei Hou
Juanzi Li
LRM
74
23
0
07 Apr 2024
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
Deqing Fu
Ghazal Khalighinejad
Ollie Liu
Bhuwan Dhingra
Dani Yogatama
Robin Jia
Willie Neiswanger
101
26
0
01 Apr 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Jiaqi Wang
Yu Qiao
Dahua Lin
Feng Zhao
VLM
139
303
0
29 Mar 2024
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Jesse Atuhurra
Iqra Ali
Tatsuya Hiraoka
Hidetaka Kamigaito
Tomoya Iwakura
Taro Watanabe
108
1
0
29 Mar 2024
Croissant: A Metadata Format for ML-Ready Datasets
Mubashara Akhtar
Omar Benjelloun
Costanza Conforti
Pieter Gijsbers
Joan Giner-Miguelez
...
Slava Tykhonov
Joaquin Vanschoren
Jos van der Velde
Steffen Vogler
Carole-Jean Wu
82
40
0
28 Mar 2024
A Review of Multi-Modal Large Language and Vision Models
Kilian Carolan
Laura Fennelly
Alan F. Smeaton
VLM
186
28
0
28 Mar 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li
Yuechen Zhang
Chengyao Wang
Zhisheng Zhong
Yixin Chen
Ruihang Chu
Shaoteng Liu
Jiaya Jia
VLM
MLLM
MoE
133
238
0
27 Mar 2024
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang
Fei Pan
Junmo Kim
In So Kweon
Chengzhi Mao
85
11
1
27 Mar 2024
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
Yiwu Zhong
Zi-Yuan Hu
Michael R. Lyu
Liwei Wang
68
1
0
27 Mar 2024
Assessment of Multimodal Large Language Models in Alignment with Human Values
Zhelun Shi
Zhipin Wang
Hongxing Fan
Zaibin Zhang
Lijun Li
Yongting Zhang
Zhen-fei Yin
Lu Sheng
Yu Qiao
Jing Shao
77
22
0
26 Mar 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Peng Gao
Hongsheng Li
107
253
0
21 Mar 2024
PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Yew Ken Chia
Vernon Toh Yan Han
Deepanway Ghosal
Lidong Bing
Soujanya Poria
LRM
ReLM
86
23
0
20 Mar 2024
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
119
47
0
19 Mar 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi R. Fung
Haoyi Qiu
Mingyang Zhou
Shafiq Joty
Shih-Fu Chang
Chenhui Xu
AI4TS
123
32
0
18 Mar 2024
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Rocktim Jyoti Das
Simeon Emilov Hristov
Haonan Li
Dimitar Iliyanov Dimitrov
Ivan Koychev
Preslav Nakov
CoGe
ELM
116
17
0
15 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
127
209
0
14 Mar 2024
MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension
Xingyu Lu
He Cao
Zijing Liu
Shengyuan Bai
Leqing Chen
Yuan Yao
Hai-Tao Zheng
Yu-Feng Li
HILM
79
10
0
13 Mar 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yueping Jiang
MLLM
83
20
0
12 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLM
VLM
121
155
0
11 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
131
373
0
08 Mar 2024
Debiasing Multimodal Large Language Models
Yi-Fan Zhang
Weichen Yu
Qingsong Wen
Xue Wang
Zhang Zhang
Liang Wang
Rong Jin
Tien-Ping Tan
93
6
0
08 Mar 2024
How Far Are We from Intelligent Visual Deductive Reasoning?
Yizhe Zhang
Richard He Bai
Ruixiang Zhang
Jiatao Gu
Shuangfei Zhai
J. Susskind
Navdeep Jaitly
ReLM
LRM
99
17
0
07 Mar 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
317
576
0
07 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
129
105
0
05 Mar 2024
Previous
1
2
3
...
12
13
14
Next