ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.16502
  4. Cited By
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
v1v2v3v4 (latest)

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
    OSLMELMVLM
ArXiv (abs)PDFHTML

Papers citing "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

50 / 700 papers shown
Title
Can MLLMs Understand the Deep Implication Behind Chinese Images?
Can MLLMs Understand the Deep Implication Behind Chinese Images?
Chenhao Zhang
Xi Feng
Yuelin Bai
Xinrun Du
Jinchang Hou
...
Min Yang
Wenhao Huang
Chenghua Lin
Ge Zhang
Shiwen Ni
ELMVLM
74
6
0
17 Oct 2024
Unearthing Skill-Level Insights for Understanding Trade-Offs of
  Foundation Models
Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Mazda Moayeri
Vidhisha Balachandran
Varun Chandrasekaran
Safoora Yousefi
Thomas Fel
Soheil Feizi
Besmira Nushi
Neel Joshi
Vibhav Vineet
71
5
0
17 Oct 2024
Harnessing Webpage UIs for Text-Rich Visual Understanding
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu
Tianyue Ou
Yifan Song
Yuxiao Qu
Wai Lam
Chenyan Xiong
Wenhu Chen
Graham Neubig
Xiang Yue
153
10
0
17 Oct 2024
H2OVL-Mississippi Vision Language Models Technical Report
H2OVL-Mississippi Vision Language Models Technical Report
Shaikat Galib
Shanshan Wang
Guanshuo Xu
Pascal Pfeiffer
Ryan Chesler
Mark Landry
Sri Satish Ambati
MLLMVLM
49
4
0
17 Oct 2024
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao
Jiaming Han
Changsheng Li
Yu-Feng Li
Xiangyu Yue
RALM
96
1
0
17 Oct 2024
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Yuzhe Yang
Yifei Zhang
Yan Hu
Y. Guo
Ruoli Gan
...
Haining Wang
Qianqian Xie
Jimin Huang
Honghai Yu
Benyou Wang
ELMAIFin
115
2
0
17 Oct 2024
WorldMedQA-V: a multilingual, multimodal medical examination dataset for
  multimodal language models evaluation
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
João Matos
Shan Chen
Siena Placino
Yingya Li
Juan Carlos Climent Pardo
...
Hugo J. W. L. Aerts
Leo Anthony Celi
A. I. Wong
Danielle S. Bitterman
Jack Gallifant
64
3
0
16 Oct 2024
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of
  Large Multimodal Models Through Coding Tasks
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Fengji Zhang
Linquan Wu
Huiyu Bai
Guancheng Lin
Xiao Li
Xiao Yu
Yue Wang
Bei Chen
Jacky Keung
MLLMELMLRM
104
0
0
16 Oct 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of
  MLLMs
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
141
5
0
16 Oct 2024
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Botian Jiang
Lei Li
Xiaonan Li
Zhaowei Li
Xiachong Feng
Dianbo Sui
Qiang Liu
Xipeng Qiu
107
3
0
16 Oct 2024
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving
  Robust Reasoning in Large Language Models via Abstraction
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
Kaiqiao Han
Tianqing Fang
Zhaowei Wang
Yangqiu Song
Mark Steedman
LRM
146
4
0
15 Oct 2024
When Does Perceptual Alignment Benefit Vision Representations?
When Does Perceptual Alignment Benefit Vision Representations?
Shobhita Sundaram
Stephanie Fu
Lukas Muttenthaler
Netanel Y. Tamir
Lucy Chai
Simon Kornblith
Trevor Darrell
Phillip Isola
113
8
1
14 Oct 2024
MEV Capture Through Time-Advantaged Arbitrage
MEV Capture Through Time-Advantaged Arbitrage
Robin Fritsch
Maria Ines Silva
A. Mamageishvili
Benjamin Livshits
E. Felten
108
9
0
14 Oct 2024
Balancing Continuous Pre-Training and Instruction Fine-Tuning:
  Optimizing Instruction-Following in LLMs
Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs
Ishan Jindal
Chandana Badrinath
Pranjal Bharti
Lakkidi Vinay
Sachin Dev Sharma
CLLALM
85
2
0
14 Oct 2024
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
Adapt-∞\infty∞: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
A. Maharana
Jaehong Yoon
Tianlong Chen
Joey Tianyi Zhou
85
0
0
14 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
163
16
0
14 Oct 2024
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
151
5
0
14 Oct 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?
Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
130
0
0
14 Oct 2024
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained
  Vision-Language Models
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua
Yunlong Tang
Ziyun Zeng
Liangliang Cao
Zhengyuan Yang
Hangfeng He
Chenliang Xu
Jiebo Luo
VLMCoGe
70
13
0
13 Oct 2024
Skipping Computations in Multimodal LLMs
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
65
3
0
12 Oct 2024
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language
  Models Alignment
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
L. Chen
Yazheng Yang
Benyou Wang
Dianbo Sui
Qiang Liu
VLMALM
102
29
0
12 Oct 2024
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Yue Yang
Shanghang Zhang
Wenqi Shao
Kaipeng Zhang
Yi Bin
Yu Wang
Ping Luo
127
5
0
11 Oct 2024
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
H. Xia
Zhengbang Yang
Junbo Zou
Rhys Tracy
Yuqing Wang
...
Xun Shao
Zhuoqing Xie
Yuan-fang Wang
Weining Shen
Hanjie Chen
ReLMLRMELM
122
4
0
11 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
148
8
0
10 Oct 2024
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
Xiaoyuan Liu
Wenxuan Wang
Youliang Yuan
Jen-tse Huang
Qiuzhi Liu
Pinjia He
Zhaopeng Tu
434
4
0
10 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLMMLLM
163
34
0
10 Oct 2024
VHELM: A Holistic Evaluation of Vision Language Models
VHELM: A Holistic Evaluation of Vision Language Models
Tony Lee
Haoqin Tu
Chi Heem Wong
Wenhao Zheng
Yiyang Zhou
...
Josselin Somerville Roberts
Michihiro Yasunaga
Huaxiu Yao
Cihang Xie
Percy Liang
VLM
95
16
0
09 Oct 2024
Pixtral 12B
Pixtral 12B
Pravesh Agrawal
Szymon Antoniak
Emma Bou Hanna
Baptiste Bout
Devendra Singh Chaplot
...
Joachim Studnia
Sandeep Subramanian
Sagar Vaze
Thomas Wang
Sophia Yang
VLMMLLM
129
65
0
09 Oct 2024
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Haoran Zhang
Hangyu Guo
Shuyue Guo
Meng Cao
Wenhao Huang
Jiaheng Liu
Ge Zhang
VLMMLLMLRM
90
3
0
09 Oct 2024
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
Yi Ding
Bolian Li
Ruqi Zhang
MLLM
138
15
0
09 Oct 2024
On the Modeling Capabilities of Large Language Models for Sequential
  Decision Making
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
Martin Klissarov
Devon Hjelm
Alexander Toshev
Bogdan Mazoure
LM&RoELMOffRLLRM
96
2
0
08 Oct 2024
Intuitions of Compromise: Utilitarianism vs. Contractualism
Intuitions of Compromise: Utilitarianism vs. Contractualism
Jared Moore
Yejin Choi
Sydney Levine
60
0
0
07 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLMLRM
74
8
0
06 Oct 2024
VISTA: A Visual and Textual Attention Dataset for Interpreting
  Multimodal Models
VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models
Harshit
Tolga Tasdizen
CoGeVLM
59
1
0
06 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with
  Collaborative Verification
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
82
25
0
05 Oct 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Shashank Yadav
Rohan Tomar
Garvit Jain
Chirag Ahooja
Shubham Chaudhary
Charles Elkan
77
0
0
05 Oct 2024
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Jiayi He
Hehai Lin
Q. Wang
Yi R. Fung
Chenhui Xu
ReLMLRM
218
7
0
05 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
219
37
0
04 Oct 2024
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal
  Foundation Models
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai
Vasileios Saveris
Chen Chen
Hong-You Chen
Haotian Zhang
...
Wenze Hu
Zhe Gan
Peter Grasch
Meng Cao
Yinfei Yang
VLM
72
4
0
03 Oct 2024
Visual Perception in Text Strings
Visual Perception in Text Strings
Qi Jia
Xiang Yue
Shanshan Huang
Ziheng Qin
Yizhu Liu
Bill Yuchen Lin
Yang You
VLM
80
2
0
02 Oct 2024
Why context matters in VQA and Reasoning: Semantic interventions for VLM
  input modalities
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
Kenza Amara
Lukas Klein
Carsten T. Lüth
Paul Jäger
Hendrik Strobelt
Mennatallah El-Assady
80
2
0
02 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Z. Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
112
7
0
02 Oct 2024
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
Zhenyue Qin
Yu Yin
Dylan Campbell
Xuansheng Wu
Ke Zou
Yih-Chung Tham
Ninghao Liu
Xiuzhen Zhang
Qingyu Chen
123
1
0
02 Oct 2024
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
Sara Ghazanfari
Alexandre Araujo
Prashanth Krishnamurthy
Siddharth Garg
Farshad Khorrami
VLM
83
2
0
02 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLMMLLM
133
41
1
30 Sep 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
116
233
0
27 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
112
23
0
26 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLMMLLMVLM
186
29
0
26 Sep 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
  Multimodal Models
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Matt Deitke
Christopher Clark
Sangho Lee
Rohun Tripathi
Yue Yang
...
Noah A. Smith
Hannaneh Hajishirzi
Ross Girshick
Ali Farhadi
Aniruddha Kembhavi
OSLMVLM
122
13
0
25 Sep 2024
Attention Prompting on Image for Large Vision-Language Models
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu
Weihao Yu
Xinchao Wang
VLM
102
11
0
25 Sep 2024
Previous
123...8910...121314
Next