ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.16502
  4. Cited By
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
v1v2v3v4 (latest)

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
    OSLMELMVLM
ArXiv (abs)PDFHTML

Papers citing "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

50 / 700 papers shown
Title
MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
Suhao Yu
Haojin Wang
Juncheng Wu
Cihang Xie
Yuyin Zhou
77
1
0
22 May 2025
Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models
Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models
Chengcheng Wang
Jianyuan Guo
Hongguang Li
Yuchuan Tian
Ying Nie
Chang Xu
Kai Han
89
0
0
22 May 2025
CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models
CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models
Arnav Verma
Kushin Mukherjee
Christopher Potts
Elisa Kreiss
Judith E. Fan
36
0
0
22 May 2025
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu
Sophia Ananiadou
MoMeKELMCLL
109
0
0
22 May 2025
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Chenhao Zhang
Yazhe Niu
118
0
0
22 May 2025
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan
Kaituo Feng
Haoming Lyu
Dongzhan Zhou
Xiangyu Yue
ReLMLRM
138
0
0
22 May 2025
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
Song Dai
Yibo Yan
Jiamin Su
Dongfang Zihao
Yubo Gao
...
Jungang Li
Junyan Zhang
Sicheng Tao
Zhuoran Gao
Xuming Hu
LRMAI4CE
63
0
0
21 May 2025
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
Penghao Wu
Lewei Lu
Ziwei Liu
131
0
0
21 May 2025
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Alex Su
Haozhe Wang
Weiming Ren
Fangzhen Lin
Wenhu Chen
MLLMOffRLLRMVLM
77
2
0
21 May 2025
lmgame-Bench: How Good are LLMs at Playing Games?
lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu
Mingjia Huo
Yu Zhang
Haoyang Yu
Eric P. Xing
Ion Stoica
Tajana Rosing
Haojian Jin
Hao Zhang
140
1
0
21 May 2025
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
Zeqing Wang
Shiyuan Zhang
Chengpei Tang
Keze Wang
LRM
81
0
0
21 May 2025
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Chengwei Wei
Bin Wang
Jung-jae Kim
Nancy F. Chen
AuLLMReLMLRM
66
0
0
21 May 2025
Abacus: A Cost-Based Optimizer for Semantic Operator Systems
Abacus: A Cost-Based Optimizer for Semantic Operator Systems
Matthew Russo
Sivaprasad Sudhir
Gerardo Vitagliano
Chunwei Liu
Tim Kraska
Samuel Madden
Michael Cafarella
153
0
0
20 May 2025
Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies
Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies
Haoyi Qiu
Kung-Hsiang Huang
Ruichen Zheng
Jiao Sun
Nanyun Peng
74
3
0
20 May 2025
ModRWKV: Transformer Multimodality in Linear Time
ModRWKV: Transformer Multimodality in Linear Time
Jiale Kang
Ziyin Yue
Qingyu Yin
Jiang Rui
W. Li
Zening Lu
Zhouran Ji
OffRL
93
0
0
20 May 2025
CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation
CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation
Anna C. Doris
Md Ferdous Alam
Amin Heyrani Nobari
Faez Ahmed
82
0
0
20 May 2025
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
He Zhu
Junyou Su
Minxin Chen
Wen Wang
Yijie Deng
Guanhua Chen
Wenjia Zhang
197
0
0
20 May 2025
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
74
0
0
20 May 2025
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Seunghyuk Cho
Zhenyue Qin
Yang Liu
Youngbin Choi
Seungbeom Lee
Dongwoo Kim
LRM
108
0
0
20 May 2025
Debating for Better Reasoning: An Unsupervised Multimodal Approach
Debating for Better Reasoning: An Unsupervised Multimodal Approach
Ashutosh Adhikari
Mirella Lapata
LRM
49
0
0
20 May 2025
Shadow-FT: Tuning Instruct via Base
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
241
0
0
19 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Yuxin Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
107
5
0
19 May 2025
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Maoyuan Ye
Jing Zhang
Juhua Liu
Bo Du
Dacheng Tao
LRM
180
0
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRLLRM
132
0
0
18 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLMLRM
132
3
0
17 May 2025
IQBench: How "Smart'' Are Vision-Language Models? A Study with Human IQ Tests
IQBench: How "Smart'' Are Vision-Language Models? A Study with Human IQ Tests
Tan-Hanh Pham
Phu-Vinh Nguyen
Dang The Hung
Bui Trong Duong
Vu Nguyen Thanh
Chris Ngo
Tri Quang Truong
Truong-Son Hy
ReLMCoGeVLMLRM
64
0
0
17 May 2025
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
Guijin Son
Jiwoo Hong
Honglu Fan
Heejeong Nam
Hyunwoo Ko
...
Jinyeop Song
Jinha Choi
Gonçalo Paulo
Youngjae Yu
Stella Biderman
108
1
0
17 May 2025
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Bonan li
Zicheng Zhang
Songhua Liu
Weihao Yu
Xinchao Wang
VLM
142
0
0
17 May 2025
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Yansheng Qiu
Li Xiao
Zhaopan Xu
Pengfei Zhou
Zheng Wang
Kai Zhang
ELMLRM
144
0
0
16 May 2025
Visual Planning: Let's Think Only with Images
Visual Planning: Let's Think Only with Images
Yi Xu
Chengzu Li
Han Zhou
Xingchen Wan
Caiqi Zhang
Anna Korhonen
Ivan Vulić
LM&RoLRM
165
1
0
16 May 2025
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
Pengju Xu
Yan Wang
Shuyuan Zhang
Xuan Zhou
Xin Li
...
Fengzhao Li
Shuigeng Zhou
Xingyu Wang
Yi Zhang
Haiying Zhao
VLM
138
1
0
16 May 2025
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Pengfei Wang
Guohai Xu
Weinong Wang
Junjie Yang
Jie Lou
Yunhua Xue
104
0
0
15 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
78
0
0
14 May 2025
Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping
Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping
Yinuo Wang
Yue Zeng
Kai Chen
Cai Meng
Chao Pan
Zhouping Tang
57
0
0
14 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Elodie Germani
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Shadi Albarqouni
AI4CE
115
0
0
14 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
87
0
0
11 May 2025
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
Takamitsu Omasa
Ryo Koshihara
Masumi Morishige
73
0
0
09 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLMVLM
155
5
0
08 May 2025
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
Jaehyun Jeon
Janghan Yoon
Minsoo Kim
Sumin Shim
Yejin Choi
Hanbin Kim
Youngjae Yu
AAML
163
0
0
08 May 2025
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu
Sheng Zhang
Guanghui Qin
Timothy Ossowski
Yu Gu
...
Sam Preston
Mu-Hsin Wei
Paul Vozila
Tristan Naumann
Hoifung Poon
OODLRMVLM
124
8
0
06 May 2025
Multi-Agent System for Comprehensive Soccer Understanding
Multi-Agent System for Comprehensive Soccer Understanding
Jiayuan Rao
Zhiyu Li
Haoning Wu
Yize Zhang
Yanfeng Wang
Weidi Xie
LLMAG
95
1
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
319
1
0
05 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
229
3
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELMLRM
129
4
0
04 May 2025
Towards Artificial Intelligence Research Assistant for Expert-Involved Learning
Towards Artificial Intelligence Research Assistant for Expert-Involved Learning
Tianyu Liu
Simeng Han
Xiao Luo
Haoyu Wang
Pan Lu
...
Arman Cohan
Hua Xu
Mark B. Gerstein
James Zou
Hongyu Zhao
81
1
0
03 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRLLRM
212
8
0
30 Apr 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Yue Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Yifan Yang
Qi Zhang
Lili Qiu
VLM
120
1
0
30 Apr 2025
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
Woongyeong Yeo
Kangsan Kim
Soyeong Jeong
Jinheon Baek
Sung Ju Hwang
150
1
0
29 Apr 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Jingyun Zhang
Chuanqi Cheng
Yang Liu
Wen Liu
Jian Luan
Rui Yan
70
4
0
28 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
Lav Varshney
117
0
0
27 Apr 2025
Previous
123456...121314
Next