ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.16502
  4. Cited By
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
v1v2v3v4 (latest)

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
    OSLMELMVLM
ArXiv (abs)PDFHTML

Papers citing "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

50 / 700 papers shown
Title
Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs
Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs
Haoran Sun
Yankai Jiang
Wenjie Lou
Yujie Zhang
Wenjie Li
Lilong Wang
Mianxin Liu
Lei Liu
Xiaosong Wang
LRM
20
0
0
20 Jun 2025
AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models
AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models
Yuan Zhang
Chun-Kai Fan
Tao Huang
Ming Lu
Sicheng Yu
Junwen Pan
Kuan Cheng
Qi She
Shanghang Zhang
VLMLRM
21
0
0
19 Jun 2025
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
46
0
0
18 Jun 2025
Demystifying the Visual Quality Paradox in Multimodal Large Language Models
Demystifying the Visual Quality Paradox in Multimodal Large Language Models
Shuo Xing
Lanqing guo
Hongyuan Hua
Seoyoung Lee
Peiran Li
Yufei Wang
Zhangyang Wang
Zhengzhong Tu
VLM
47
0
0
18 Jun 2025
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Ankan Deria
Adinath Madhavrao Dukre
Feilong Tang
Sara Atito
Sudipta Roy
Muhammad Awais
Muhammad Haris Khan
Imran Razzak
VLM
42
0
0
18 Jun 2025
Show-o2: Improved Native Unified Multimodal Models
Show-o2: Improved Native Unified Multimodal Models
Jinheng Xie
Zhenheng Yang
Mike Zheng Shou
VGen
46
0
0
18 Jun 2025
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks
Zijian Song
Xiaoxin Lin
Qiuming Huang
Guangrun Wang
Liang Lin
LRM
30
0
0
17 Jun 2025
CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model
CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model
Jiangtong Li
Yiyun Zhu
Dawei Cheng
Zhijun Ding
Changjun Jiang
30
0
0
16 Jun 2025
CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Songtao Jiang
Yuan Wang
Ruizhe Chen
Yan Zhang
Ruilin Luo
...
Sibo Song
Yang Feng
Jimeng Sun
Jian Wu
Zuozhu Liu
OffRLLRM
17
0
0
15 Jun 2025
Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
Youze Wang
Zijun Chen
Ruoyu Chen
Shishen Gu
Yinpeng Dong
Hang Su
Jun Zhu
Meng Wang
Richang Hong
Wenbo Hu
23
0
0
14 Jun 2025
Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models
Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models
Zongyu Wu
Minhua Lin
Zhiwei Zhang
Fali Wang
Xianren Zhang
Xiang Zhang
Suhang Wang
38
0
0
14 Jun 2025
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoEVLM
34
0
0
13 Jun 2025
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
Yuan Gao
Mattia Piccinini
Yuchen Zhang
Dingrui Wang
Korbinian Moller
...
Steven Peters
Andrea Stocco
Bassam Alrifaee
Marco Pavone
Johannes Betz
28
0
0
13 Jun 2025
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
René Peinl
Vincent Tischler
CoGeVLM
39
0
0
13 Jun 2025
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
Yingjin Song
Yupei Du
Denis Paperno
Albert Gatt
MLLM
133
0
0
12 Jun 2025
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Zhiyang Xu
Jiuhai Chen
Zhaojiang Lin
Xichen Pan
Lifu Huang
...
Di Jin
Michihiro Yasunaga
Lili Yu
Xi Lin
Shaoliang Nie
121
1
0
12 Jun 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu
Y. Wu
Meng Chu
Zhifei Ren
Z. Huang
...
Conghui He
Yu Qiao
Yali Wang
Yi Wang
L. Wang
LRM
119
0
0
12 Jun 2025
Magistral
Magistral
Mistral-AI
Abhinav Rastogi
Albert Q. Jiang
Andy Lo
Gabrielle Berrada
...
Virgile Richard
Wen-Ding Li
William Marshall
Xuanyu Zhang
Yunhao Tang
OffRLReLMMoEAI4TSLRM
131
0
0
12 Jun 2025
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
Xiyao Wang
Zhengyuan Yang
Chao Feng
Yongyuan Liang
Yuhang Zhou
...
Chung-Ching Lin
Kevin Lin
Linjie Li
Furong Huang
L. xilinx Wang
OffRLLRM
64
0
0
11 Jun 2025
Vision Generalist Model: A Survey
Vision Generalist Model: A Survey
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
70
0
0
11 Jun 2025
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang
Zhenhua Liu
Jiaheng Wei
Xuanwu Yin
Dong Li
E. Barsoum
LRM
84
0
0
11 Jun 2025
Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions
David Acuna
Ximing Lu
Jaehun Jung
Hyunwoo J. Kim
Amlan Kar
Sanja Fidler
Yejin Choi
ReLMLRM
40
0
0
10 Jun 2025
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
Dianyi Wang
Wei Song
Yikun Wang
Siyuan Wang
Kaicheng Yu
Zhongyu Wei
Jiaqi Wang
41
1
0
10 Jun 2025
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Zheqi He
Yesheng Liu
Jing-shu Zheng
Xuejing Li
Richeng Xuan
Jin-Ge Yao
Xi Yang
Xi Yang
MLLMVLM
44
0
0
10 Jun 2025
Reinforcing Multimodal Understanding and Generation with Dual Self-rewards
Reinforcing Multimodal Understanding and Generation with Dual Self-rewards
Jixiang Hong
Yiran Zhang
Guanzhong Wang
Yi Liu
Ji-Rong Wen
Rui Yan
LRM
32
0
0
09 Jun 2025
From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
Zhanke Zhou
Xiao Feng
Zhaocheng Zhu
Jiangchao Yao
Sanmi Koyejo
Bo Han
LRM
22
0
0
09 Jun 2025
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
Zhiyu Lin
Zhengda Zhou
Zhiyuan Zhao
Tianrui Wan
Yilun Ma
Junyu Gao
Xuelong Li
ELM
27
0
0
09 Jun 2025
SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
Mengya Xu
Zhongzhen Huang
Dillan Imans
Yiru Ye
Xiaofan Zhang
Qi Dou
26
0
0
08 Jun 2025
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury
Mohamed Elmoghany
Yohan Abeysinghe
Junjie Fei
Sayan Nag
Salman Khan
Mohamed Elhoseiny
Dinesh Manocha
35
0
0
08 Jun 2025
LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer
LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer
Ying Shen
Zhiyang Xu
Jiuhai Chen
Shizhe Diao
Jiaxin Zhang
Yuguang Yao
Joy Rimchala
Ismini Lourentzou
Lifu Huang
OffRL
33
0
0
08 Jun 2025
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
Bhuiyan Sanjid Shafique
Ashmal Vayani
Muhammad Maaz
H. Rasheed
Dinura Dissanayake
...
Shiníchi Satoh
Michael Felsberg
M. Shah
Salman Khan
Fahad Shahbaz Khan
VLM
26
0
0
08 Jun 2025
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging
Zichen Tang
Haihong E
Ziyan Ma
Haoyang He
Jiacheng Liu
...
Kun Ji
Qing Huang
Xinyang Hu
Yang Liu
Qianhe Zheng
AIMatAIFinELM
82
0
0
06 Jun 2025
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
Hengzhi Li
Brendon Jiang
Alexander Naehu
Regan Song
Justin Zhang
...
Steven-Shine Chen
Adithya Balachandran
Wei Dai
Rebecca Chang
Paul Pu Liang
ReLMLRM
68
0
0
06 Jun 2025
MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?
MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?
Zhitao He
Zongwei Lyu
Dazhong Chen
Dadi Guo
Yi R. Fung
LRM
64
0
0
06 Jun 2025
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
Linjie Li
Mahtab Bigverdi
Jiawei Gu
Zixian Ma
Yinuo Yang
Ziang Li
Yejin Choi
Ranjay Krishna
LRM
93
0
0
05 Jun 2025
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Xin Jin
Zhenguo Li
James T. Kwok
Yu Zhang
LRM
108
0
0
05 Jun 2025
MuSciClaims: Multimodal Scientific Claim Verification
Yash Kumar Lal
Manikanta Bandham
Mohammad Saqib Hasan
Apoorva Kashi
Mahnaz Koupaee
Niranjan Balasubramanian
98
0
0
05 Jun 2025
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
H. Rasheed
Abdelrahman M. Shaker
Anqi Tang
Muhammad Maaz
Ming-Hsuan Yang
Salman Khan
Fahad A Khan
AIMat
123
0
0
05 Jun 2025
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
Junjie Xing
Yeye He
Mengyu Zhou
Haoyu Dong
Shi Han
Lingjiao Chen
Dongmei Zhang
S. Chaudhuri
H. V. Jagadish
LMTDELMLRM
44
0
0
05 Jun 2025
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Zhihao Tang
Chaozhuo Li
Litian Zhang
Xi Zhang
DiffMMedIm
52
9
0
05 Jun 2025
CIVET: Systematic Evaluation of Understanding in VLMs
CIVET: Systematic Evaluation of Understanding in VLMs
Massimo Rizzoli
Simone Alghisi
Olha Khomyn
Gabriel Roccabruna
Seyed Mahed Mousavi
Giuseppe Riccardi
174
0
0
05 Jun 2025
Multimodal Tabular Reasoning with Privileged Structured Information
Multimodal Tabular Reasoning with Privileged Structured Information
Jun-Peng Jiang
Yu Xia
Hai-Long Sun
Shiyin Lu
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
LMTDLRM
102
0
0
04 Jun 2025
How Far Are We from Predicting Missing Modalities with Foundation Models?
Guanzhou Ke
Yi Xie
Xiaoli Wang
Guoqing Chao
Bo Wang
Shengfeng He
VLM
109
0
0
04 Jun 2025
Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts
Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts
Jiaxing Zhang
Xinyi Zeng
Hao Tang
91
0
0
04 Jun 2025
MiMo-VL Technical Report
MiMo-VL Technical Report
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
...
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
OffRLMoEVLMLRM
91
0
0
04 Jun 2025
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
Hao Yan
Handong Zheng
Hao Wang
Liang Yin
Xingchen Liu
...
Minghui Liao
Chao Weng
Wei Chen
Yuliang Liu
Xiang Bai
LRM
58
0
0
03 Jun 2025
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Bin Lin
Zongjian Li
Xinhua Cheng
Yuwei Niu
Yang Ye
...
Wangbo Yu
Shaodong Wang
Yunyang Ge
Yatian Pang
Li Yuan
VLM
78
0
0
03 Jun 2025
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Mengdi Jia
Zekun Qi
Shaochen Zhang
Wenyao Zhang
Xinqiang Yu
Jiawei He
He Wang
L. Yi
LRMVLM
62
0
0
03 Jun 2025
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
Yicheng Xiao
Lin Song
Rui Yang
Cheng Cheng
Zunnan Xu
Zhaoyang Zhang
Yixiao Ge
Xiu Li
Ying Shan
62
2
0
03 Jun 2025
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Xiaojun Shan
Qi Cao
Xing Han
Haofei Yu
Paul Liang
55
0
0
02 Jun 2025
1234...121314
Next