ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.16502
  4. Cited By
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
v1v2v3v4 (latest)

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
    OSLMELMVLM
ArXiv (abs)PDFHTML

Papers citing "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

50 / 700 papers shown
Title
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language
  Models
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Lizhou Fan
Wenyue Hua
Xiang Li
Kaijie Zhu
Mingyu Jin
...
Haoyang Ling
Jinkui Chi
Jindong Wang
Xin Ma
Yongfeng Zhang
LRM
88
14
0
04 Mar 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
  Large Vision-Language Models
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
129
58
0
01 Mar 2024
Artwork Explanation in Large-scale Vision Language Models
Artwork Explanation in Large-scale Vision Language Models
Kazuki Hayashi
Yusuke Sakai
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
32
0
0
29 Feb 2024
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid
  Progress
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Ameya Prabhu
Vishaal Udandarao
Philip Torr
Matthias Bethge
Adel Bibi
Samuel Albanie
94
4
0
29 Feb 2024
Towards Open-ended Visual Quality Comparison
Towards Open-ended Visual Quality Comparison
Haoning Wu
Hanwei Zhu
Zicheng Zhang
Erli Zhang
Chaofeng Chen
...
Qiong Yan
Xiaohong Liu
Guangtao Zhai
Shiqi Wang
Weisi Lin
AAML
111
55
0
26 Feb 2024
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models
  Evaluation
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
Yi Zong
Xipeng Qiu
ELMVLM
59
8
0
24 Feb 2024
Towards Robust Instruction Tuning on Multimodal Large Language Models
Towards Robust Instruction Tuning on Multimodal Large Language Models
Wei Han
Hui Chen
Soujanya Poria
MLLM
78
1
0
22 Feb 2024
Uncertainty-Aware Evaluation for Vision-Language Models
Uncertainty-Aware Evaluation for Vision-Language Models
Vasily Kostumov
Bulat Nutfullin
Oleg Pilipenko
Eugene Ilyushin
ELM
208
9
0
22 Feb 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
165
282
0
21 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRMVLM
137
64
0
19 Feb 2024
DriveVLM: The Convergence of Autonomous Driving and Large
  Vision-Language Models
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
Xiaoyu Tian
Junru Gu
Bailin Li
Yicheng Liu
Yang Wang
Chenxu Hu
Kun Zhan
Peng Jia
Xianpeng Lang
Hang Zhao
VLM
211
165
0
19 Feb 2024
Scaffolding Coordinates to Promote Vision-Language Coordination in Large
  Multi-Modal Models
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
Xuanyu Lei
Zonghan Yang
Xinrui Chen
Peng Li
Yang Liu
MLLMLRM
103
38
0
19 Feb 2024
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language
  Models
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
Guiming Hardy Chen
Shunian Chen
Ruifei Zhang
Junying Chen
Xiangbo Wu
Zhiyi Zhang
Zhihong Chen
Jianquan Li
Xiang Wan
Benyou Wang
VLMSyDa
138
139
0
18 Feb 2024
Efficient Multimodal Learning from Data-centric Perspective
Efficient Multimodal Learning from Data-centric Perspective
Muyang He
Yexin Liu
Boya Wu
Jianhao Yuan
Yueze Wang
Tiejun Huang
Bo Zhao
MLLM
90
88
0
18 Feb 2024
SciAgent: Tool-augmented Language Models for Scientific Reasoning
SciAgent: Tool-augmented Language Models for Scientific Reasoning
Yubo Ma
Zhibin Gou
Junheng Hao
Ruochen Xu
Shuohang Wang
...
Yujiu Yang
Yixin Cao
Aixin Sun
Hany Awadalla
Weizhu Chen
RALMLRMLLMAG
128
24
0
18 Feb 2024
Exploring Perceptual Limitation of Multimodal Large Language Models
Exploring Perceptual Limitation of Multimodal Large Language Models
Jiarui Zhang
Jinyi Hu
Mahyar Khayatkhoei
Filip Ilievski
Maosong Sun
LRM
81
11
0
12 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
241
116
0
08 Feb 2024
SceMQA: A Scientific College Entrance Level Multimodal Question
  Answering Benchmark
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
Zhenwen Liang
Kehan Guo
Gang Liu
Taicheng Guo
Yujun Zhou
Tianyu Yang
Jiajun Jiao
Renjie Pi
Jipeng Zhang
Xiangliang Zhang
ELM
86
24
0
06 Feb 2024
GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual
  AI for Smart Eyewear
GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear
Robert Konrad
Nitish Padmanaban
J. G. Buckmaster
Kevin C. Boyle
Gordon Wetzstein
50
16
0
30 Jan 2024
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large
  Language Models
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
Wai-Chung Kwan
Xingshan Zeng
Yuxin Jiang
Yufei Wang
Liangyou Li
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRMELM
51
22
0
30 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLMMLLM
159
268
0
29 Jan 2024
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding
  and Reasoning in Pathology
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
Yuxuan Sun
Hao Wu
Chenglu Zhu
Sunyi Zheng
Qizi Chen
...
Mengyue Zheng
Jingxiong Li
Xinheng Lyu
Tao Lin
Lin Yang
LM&MA
117
18
0
29 Jan 2024
Development and Testing of a Novel Large Language Model-Based Clinical
  Decision Support Systems for Medication Safety in 12 Clinical Specialties
Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties
J. Ong
Liyuan Jin
Kabilan Elangovan
Gilbert Yong San Lim
D. Lim
...
Xiang Chen
J. Chng
A. Than
Ken Junyang Goh
Daniel Ting
46
11
0
29 Jan 2024
Muffin or Chihuahua? Challenging Multimodal Large Language Models with
  Multipanel VQA
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA
Yue Fan
Jing Gu
KAI-QING Zhou
Qianqi Yan
Shan Jiang
Ching-Chen Kuo
Xinze Guan
Xin Eric Wang
103
8
0
29 Jan 2024
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question
  Understanding and Reasoning
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Zheqi He
Xinya Wu
Pengfei Zhou
Richeng Xuan
Guang Liu
Xi Yang
Qiannan Zhu
Hua Huang
ELMLRM
108
20
0
25 Jan 2024
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web
  Tasks
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh
Robert Lo
Lawrence Jang
Vikram Duvvur
Ming Chong Lim
Po-Yu Huang
Graham Neubig
Shuyan Zhou
Ruslan Salakhutdinov
Daniel Fried
138
0
0
24 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRLLRM
164
217
0
24 Jan 2024
The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large
  Language Models
The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models
Kian Ahrabian
Zhivar Sourati
Kexuan Sun
Jiarui Zhang
Yifan Jiang
Fred Morstatter
Jay Pujara
LRM
135
9
0
22 Jan 2024
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model
  Reasoning over Image Sequences
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
Xiyao Wang
Yuhang Zhou
Xiaoyu Liu
Hongjin Lu
Yuancheng Xu
...
Taixi Lu
Gedas Bertasius
Mohit Bansal
Huaxiu Yao
Furong Huang
LRMVLM
166
78
0
19 Jan 2024
COCO is "ALL'' You Need for Visual Instruction Fine-tuning
COCO is "ALL'' You Need for Visual Instruction Fine-tuning
Xiaotian Han
Yiqi Wang
Bohan Zhai
Quanzeng You
Hongxia Yang
VLMMLLM
62
2
0
17 Jan 2024
MERA: A Comprehensive LLM Evaluation in Russian
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Alexander Panchenko
Sergey Markov
ELM
97
12
0
09 Jan 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng
Boyu Gou
Jihyung Kil
Huan Sun
Yu-Chuan Su
MLLMVLMLLMAG
142
264
0
03 Jan 2024
Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLMLRM
155
291
0
20 Dec 2023
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Jiahui Gao
Renjie Pi
Jipeng Zhang
Jiacheng Ye
Wanjun Zhong
...
Lanqing Hong
Jianhua Han
Hang Xu
Zhenguo Li
Lingpeng Kong
SyDaReLMLRM
119
119
0
18 Dec 2023
An Evaluation of GPT-4V and Gemini in Online VQA
An Evaluation of GPT-4V and Gemini in Online VQA
Mengchen Liu
Chongyan Chen
Danna Gurari
MLLM
123
7
0
17 Dec 2023
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?
Fuheng Zhao
Lawrence Lim
Ishtiyaque Ahmad
D. Agrawal
A. El Abbadi
Amr El Abbadi
119
13
0
16 Dec 2023
Assessing GPT4-V on Structured Reasoning Tasks
Assessing GPT4-V on Structured Reasoning Tasks
Mukul Singh
J. Cambronero
Sumit Gulwani
Vu Le
Gust Verbruggen
LRM
73
13
0
13 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
106
133
0
11 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for
  Human-Level Planning
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAGELMLRM
103
13
0
11 Dec 2023
Lyrics: Boosting Fine-grained Language-Vision Alignment and
  Comprehension via Semantic-aware Visual Objects
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Junyu Lu
Ruyi Gan
Di Zhang
Xiaojun Wu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
MLLMVLM
96
17
0
08 Dec 2023
Text as Images: Can Multimodal Large Language Models Follow Printed
  Instructions in Pixels?
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
Xiujun Li
Yujie Lu
Zhe Gan
Jianfeng Gao
William Y. Wang
Yejin Choi
VLMMLLM
79
3
0
29 Nov 2023
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi
Tobia Poppi
Federico Cocchi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
93
9
0
27 Nov 2023
Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge
  Graph Completion?
Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?
Yusuke Sakai
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
75
1
0
15 Nov 2023
Explore Spurious Correlations at the Concept Level in Language Models
  for Text Classification
Explore Spurious Correlations at the Concept Level in Language Models for Text Classification
Yuhang Zhou
Paiheng Xu
Xiaoyu Liu
Bang An
Wei Ai
Furong Huang
LRM
187
28
0
15 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLMMLLM
176
517
0
06 Nov 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
77
24
0
31 Aug 2023
SVIT: Scaling up Visual Instruction Tuning
SVIT: Scaling up Visual Instruction Tuning
Bo Zhao
Boya Wu
Muyang He
Tiejun Huang
MLLM
94
128
0
09 Jul 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLMLRM
138
613
0
23 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELMMLLM
108
174
0
15 Jun 2023
Multimodal Chain-of-Thought Reasoning in Language Models
Multimodal Chain-of-Thought Reasoning in Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Hai Zhao
George Karypis
Alexander J. Smola
LRM
142
466
0
02 Feb 2023
Previous
123...121314