ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.16502
  4. Cited By
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
    OSLM
    ELM
    VLM
ArXivPDFHTML

Papers citing "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

50 / 587 papers shown
Title
OmChat: A Recipe to Train Multimodal Language Models with Strong Long
  Context and Video Understanding
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
54
5
0
06 Jul 2024
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for
  Text-to-Image Generation?
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen
Yichao Du
Zichen Wen
Yiyang Zhou
Chenhang Cui
...
Jiawei Zhou
Zhuokai Zhao
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
MLLM
65
29
0
05 Jul 2024
On scalable oversight with weak LLMs judging strong LLMs
On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton
Noah Y. Siegel
János Kramár
Jonah Brown-Cohen
Samuel Albanie
...
Rishabh Agarwal
David Lindner
Yunhao Tang
Noah D. Goodman
Rohin Shah
ELM
43
31
0
05 Jul 2024
Unified Interpretation of Smoothing Methods for Negative Sampling Loss
  Functions in Knowledge Graph Embedding
Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding
Xincan Feng
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
44
1
0
05 Jul 2024
Smart Vision-Language Reasoners
Smart Vision-Language Reasoners
Denisa Roberts
Lucas Roberts
VLM
ReLM
LRM
56
4
0
05 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Chenyu You
Jimmy Huang
ELM
ALM
31
28
0
04 Jul 2024
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal
  Models Across Multilingual and Multicultural Vision-Language Tasks
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks
Florian Schneider
Sunayana Sitaram
VLM
50
7
0
04 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
  Supporting Long-Contextual Input and Output
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
47
100
0
03 Jul 2024
VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
Zhe Hu
Yixiao Ren
Jing Li
Yu Yin
VLM
44
4
0
03 Jul 2024
TokenPacker: Efficient Visual Projector for Multimodal LLM
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li
Yuqian Yuan
Jian Liu
Dongqi Tang
Song Wang
Jie Qin
Jianke Zhu
Lei Zhang
MLLM
37
53
0
02 Jul 2024
Synthetic Multimodal Question Generation
Synthetic Multimodal Question Generation
Ian Wu
Sravan Jayanthi
Vijay Viswanathan
Simon Rosenberg
Sina Pakazad
Tongshuang Wu
Graham Neubig
50
2
0
02 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and
  Aleatoric Awareness
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
53
2
0
02 Jul 2024
VSP: Assessing the dual challenges of perception and reasoning in
  spatial planning tasks for VLMs
VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Qiucheng Wu
Handong Zhao
Michael Stephen Saxon
T. Bui
William Yang Wang
Yang Zhang
Shiyu Chang
CoGe
48
5
0
02 Jul 2024
We-Math: Does Your Large Multimodal Model Achieve Human-like
  Mathematical Reasoning?
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao
Qiuna Tan
Guanting Dong
Minhui Wu
Chong Sun
...
Yida Xu
Muxi Diao
Zhimin Bao
Chen Li
Honggang Zhang
VLM
LRM
47
32
0
01 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
113
13
0
01 Jul 2024
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang
Liang Chen
Taian Guo
Fu Zeng
Yusheng Zhao
...
Wei Ju
Luchen Liu
Tianyu Liu
Baobao Chang
Ming Zhang
46
5
0
29 Jun 2024
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework
  for Multimodal LLMs
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Sukmin Yun
Haokun Lin
Rusiru Thushara
Mohammad Qazim Bhat
Yongxin Wang
...
Timothy Baldwin
Zhengzhong Liu
Eric P. Xing
Xiaodan Liang
Zhiqiang Shen
54
10
0
28 Jun 2024
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context
  Compression
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression
Jieneng Chen
Luoxin Ye
Ju He
Zhao-Yang Wang
Daniel Khashabi
Alan Yuille
VLM
27
5
0
28 Jun 2024
STLLaVA-Med: Self-Training Large Language and Vision Assistant for
  Medical
STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical
Guohao Sun
Can Qin
Huazhu Fu
Linwei Wang
Zhiqiang Tao
LM&MA
40
3
0
28 Jun 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via
  Data Synthesis
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
LRM
47
10
0
28 Jun 2024
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
Jinming Li
Yichen Zhu
Zhiyuan Xu
Jindong Gu
Minjie Zhu
Xin Liu
Ning Liu
Yaxin Peng
Feifei Feng
Jian Tang
LRM
LM&Ro
36
7
0
28 Jun 2024
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for
  Foundation Models
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Zhi-Long Ji
Jin-Feng Bai
Zhen-Ru Pan
Fan-Hu Zeng
Jian Xu
Jia-Xin Zhang
Cheng-Lin Liu
ELM
51
12
0
28 Jun 2024
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into
  Multimodal LLMs at Scale
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Junying Chen
Ruyi Ouyang
Anningzhe Gao
Shunian Chen
Guiming Hardy Chen
...
Zhenyang Cai
Ke Ji
Guangjun Yu
Xiang Wan
Benyou Wang
MedIm
LM&MA
51
32
0
27 Jun 2024
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with
  Flowcharts
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
Shubhankar Singh
Purvi Chaurasia
Yerram Varun
Pranshu Pandya
Vatsal Gupta
Vivek Gupta
Dan Roth
36
4
0
27 Jun 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
73
22
0
27 Jun 2024
S3: A Simple Strong Sample-effective Multimodal Dialog System
S3: A Simple Strong Sample-effective Multimodal Dialog System
Elisei Rykov
Egor Malkershin
Alexander Panchenko
28
0
0
26 Jun 2024
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large
  Language Models
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Wenhao Shi
Zhiqiang Hu
Yi Bin
Junhua Liu
Yang Yang
See-Kiong Ng
Lidong Bing
Roy Ka-Wei Lee
SyDa
MLLM
LRM
34
41
0
25 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
53
287
0
24 Jun 2024
Losing Visual Needles in Image Haystacks: Vision Language Models are
  Easily Distracted in Short and Long Contexts
Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts
Aditya Sharma
Michael Saxon
William Yang Wang
VLM
44
2
0
24 Jun 2024
Evaluating and Analyzing Relationship Hallucinations in Large
  Vision-Language Models
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
Mingrui Wu
Jiayi Ji
Oucheng Huang
Jiale Li
Yuhang Wu
Xiaoshuai Sun
Rongrong Ji
53
8
0
24 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Shri Kiran Srinivasan
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
92
23
0
23 Jun 2024
Evaluating Large Vision-and-Language Models on Children's Mathematical
  Olympiads
Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Joanna Matthiesen
Kevin A. Smith
J. Tenenbaum
ELM
LRM
41
7
0
22 Jun 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Yuxuan Qiao
Haodong Duan
Xinyu Fang
Junming Yang
Lin Chen
Songyang Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LRM
45
19
0
20 Jun 2024
African or European Swallow? Benchmarking Large Vision-Language Models
  for Fine-Grained Object Classification
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification
Gregor Geigle
Radu Timofte
Goran Glavaš
31
10
0
20 Jun 2024
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal
  Documents
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Junjie Wang
Yin Zhang
Yatai Ji
Yuxiang Zhang
Chunyang Jiang
...
Bei Chen
Qunshu Lin
Minghao Liu
Ge Zhang
Wenhu Chen
48
3
0
20 Jun 2024
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Siyu Yuan
Kaitao Song
Jiangjie Chen
Xu Tan
Dongsheng Li
Deqing Yang
LLMAG
66
14
0
20 Jun 2024
SpatialBot: Precise Spatial Understanding with Vision Language Models
SpatialBot: Precise Spatial Understanding with Vision Language Models
Wenxiao Cai
Yaroslav Ponomarenko
Jianhao Yuan
Xiaoqi Li
Wankou Yang
Hao Dong
Bo Zhao
VLM
56
30
0
19 Jun 2024
Benchmarking Multi-Image Understanding in Vision and Language Models:
  Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Bingchen Zhao
Yongshuo Zong
Letian Zhang
Timothy Hospedales
VLM
33
15
0
18 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
51
30
0
18 Jun 2024
Look Further Ahead: Testing the Limits of GPT-4 in Path Planning
Look Further Ahead: Testing the Limits of GPT-4 in Path Planning
Mohamed Aghzal
Erion Plaku
Ziyu Yao
ELM
36
6
0
17 Jun 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
  Instruction-Tuning Dataset for LVLMs
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Ziyu Liu
Tao Chu
Yuhang Zang
Xilin Wei
Xiaoyi Dong
...
Zijian Liang
Yuanjun Xiong
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
45
35
0
17 Jun 2024
Improving Multi-Agent Debate with Sparse Communication Topology
Improving Multi-Agent Debate with Sparse Communication Topology
Yunxuan Li
Yibing Du
Jiageng Zhang
Le Hou
Peter Grabowski
Yeqing Li
Eugene Ie
LLMAG
36
18
0
17 Jun 2024
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with
  Geometric Image Generation
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Shihao Cai
Keqin Bao
Hangyu Guo
Jizhi Zhang
Jun Song
Bo Zheng
47
15
0
17 Jun 2024
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Yunxin Li
Xinyu Chen
Baotian Hu
Longyue Wang
Haoyuan Shi
Min-Ling Zhang
MLLM
LRM
56
26
0
17 Jun 2024
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal
  Dataset with One Trillion Tokens
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
Anas Awadalla
Le Xue
Oscar Lo
Manli Shu
Hannah Lee
...
Silvio Savarese
Caiming Xiong
Ran Xu
Yejin Choi
Ludwig Schmidt
75
25
0
17 Jun 2024
Generative Visual Instruction Tuning
Generative Visual Instruction Tuning
Jefferson Hernandez
Ruben Villegas
Vicente Ordonez
VLM
38
3
0
17 Jun 2024
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Hengyi Wang
Haizhou Shi
Shiwei Tan
Weiyi Qin
Wenyuan Wang
Tunyu Zhang
A. Nambi
T. Ganu
Hao Wang
71
15
0
17 Jun 2024
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report
Franz Louis Cesista
VGen
52
6
0
17 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human
  Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
51
26
0
16 Jun 2024
A Comprehensive Survey of Scientific Large Language Models and Their
  Applications in Scientific Discovery
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
Yu Zhang
Xiusi Chen
Bowen Jin
Sheng Wang
Shuiwang Ji
Wei Wang
Jiawei Han
49
29
0
16 Jun 2024
Previous
123...10111289
Next