Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.16502
Cited By
v1
v2
v3
v4 (latest)
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
50 / 700 papers shown
Title
MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
Suhao Yu
Haojin Wang
Juncheng Wu
Cihang Xie
Yuyin Zhou
77
1
0
22 May 2025
Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models
Chengcheng Wang
Jianyuan Guo
Hongguang Li
Yuchuan Tian
Ying Nie
Chang Xu
Kai Han
89
0
0
22 May 2025
CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models
Arnav Verma
Kushin Mukherjee
Christopher Potts
Elisa Kreiss
Judith E. Fan
36
0
0
22 May 2025
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu
Sophia Ananiadou
MoMe
KELM
CLL
109
0
0
22 May 2025
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Chenhao Zhang
Yazhe Niu
118
0
0
22 May 2025
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan
Kaituo Feng
Haoming Lyu
Dongzhan Zhou
Xiangyu Yue
ReLM
LRM
138
0
0
22 May 2025
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
Song Dai
Yibo Yan
Jiamin Su
Dongfang Zihao
Yubo Gao
...
Jungang Li
Junyan Zhang
Sicheng Tao
Zhuoran Gao
Xuming Hu
LRM
AI4CE
63
0
0
21 May 2025
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
Penghao Wu
Lewei Lu
Ziwei Liu
131
0
0
21 May 2025
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Alex Su
Haozhe Wang
Weiming Ren
Fangzhen Lin
Wenhu Chen
MLLM
OffRL
LRM
VLM
77
2
0
21 May 2025
lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu
Mingjia Huo
Yu Zhang
Haoyang Yu
Eric P. Xing
Ion Stoica
Tajana Rosing
Haojian Jin
Hao Zhang
140
1
0
21 May 2025
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
Zeqing Wang
Shiyuan Zhang
Chengpei Tang
Keze Wang
LRM
81
0
0
21 May 2025
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Chengwei Wei
Bin Wang
Jung-jae Kim
Nancy F. Chen
AuLLM
ReLM
LRM
66
0
0
21 May 2025
Abacus: A Cost-Based Optimizer for Semantic Operator Systems
Matthew Russo
Sivaprasad Sudhir
Gerardo Vitagliano
Chunwei Liu
Tim Kraska
Samuel Madden
Michael Cafarella
153
0
0
20 May 2025
Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies
Haoyi Qiu
Kung-Hsiang Huang
Ruichen Zheng
Jiao Sun
Nanyun Peng
74
3
0
20 May 2025
ModRWKV: Transformer Multimodality in Linear Time
Jiale Kang
Ziyin Yue
Qingyu Yin
Jiang Rui
W. Li
Zening Lu
Zhouran Ji
OffRL
93
0
0
20 May 2025
CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation
Anna C. Doris
Md Ferdous Alam
Amin Heyrani Nobari
Faez Ahmed
82
0
0
20 May 2025
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
He Zhu
Junyou Su
Minxin Chen
Wen Wang
Yijie Deng
Guanhua Chen
Wenjia Zhang
197
0
0
20 May 2025
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
74
0
0
20 May 2025
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Seunghyuk Cho
Zhenyue Qin
Yang Liu
Youngbin Choi
Seungbeom Lee
Dongwoo Kim
LRM
108
0
0
20 May 2025
Debating for Better Reasoning: An Unsupervised Multimodal Approach
Ashutosh Adhikari
Mirella Lapata
LRM
49
0
0
20 May 2025
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
241
0
0
19 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Yuxin Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
107
5
0
19 May 2025
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Maoyuan Ye
Jing Zhang
Juhua Liu
Bo Du
Dacheng Tao
LRM
180
0
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
132
0
0
18 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
132
3
0
17 May 2025
IQBench: How "Smart'' Are Vision-Language Models? A Study with Human IQ Tests
Tan-Hanh Pham
Phu-Vinh Nguyen
Dang The Hung
Bui Trong Duong
Vu Nguyen Thanh
Chris Ngo
Tri Quang Truong
Truong-Son Hy
ReLM
CoGe
VLM
LRM
64
0
0
17 May 2025
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
Guijin Son
Jiwoo Hong
Honglu Fan
Heejeong Nam
Hyunwoo Ko
...
Jinyeop Song
Jinha Choi
Gonçalo Paulo
Youngjae Yu
Stella Biderman
108
1
0
17 May 2025
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Bonan li
Zicheng Zhang
Songhua Liu
Weihao Yu
Xinchao Wang
VLM
142
0
0
17 May 2025
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Yansheng Qiu
Li Xiao
Zhaopan Xu
Pengfei Zhou
Zheng Wang
Kai Zhang
ELM
LRM
144
0
0
16 May 2025
Visual Planning: Let's Think Only with Images
Yi Xu
Chengzu Li
Han Zhou
Xingchen Wan
Caiqi Zhang
Anna Korhonen
Ivan Vulić
LM&Ro
LRM
165
1
0
16 May 2025
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
Pengju Xu
Yan Wang
Shuyuan Zhang
Xuan Zhou
Xin Li
...
Fengzhao Li
Shuigeng Zhou
Xingyu Wang
Yi Zhang
Haiying Zhao
VLM
138
1
0
16 May 2025
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Pengfei Wang
Guohai Xu
Weinong Wang
Junjie Yang
Jie Lou
Yunhua Xue
104
0
0
15 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
78
0
0
14 May 2025
Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping
Yinuo Wang
Yue Zeng
Kai Chen
Cai Meng
Chao Pan
Zhouping Tang
57
0
0
14 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Elodie Germani
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Shadi Albarqouni
AI4CE
115
0
0
14 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
87
0
0
11 May 2025
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
Takamitsu Omasa
Ryo Koshihara
Masumi Morishige
73
0
0
09 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
155
5
0
08 May 2025
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
Jaehyun Jeon
Janghan Yoon
Minsoo Kim
Sumin Shim
Yejin Choi
Hanbin Kim
Youngjae Yu
AAML
163
0
0
08 May 2025
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu
Sheng Zhang
Guanghui Qin
Timothy Ossowski
Yu Gu
...
Sam Preston
Mu-Hsin Wei
Paul Vozila
Tristan Naumann
Hoifung Poon
OOD
LRM
VLM
124
8
0
06 May 2025
Multi-Agent System for Comprehensive Soccer Understanding
Jiayuan Rao
Zhiyu Li
Haoning Wu
Yize Zhang
Yanfeng Wang
Weidi Xie
LLMAG
95
1
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
319
1
0
05 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
229
3
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
129
4
0
04 May 2025
Towards Artificial Intelligence Research Assistant for Expert-Involved Learning
Tianyu Liu
Simeng Han
Xiao Luo
Haoyu Wang
Pan Lu
...
Arman Cohan
Hua Xu
Mark B. Gerstein
James Zou
Hongyu Zhao
81
1
0
03 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
212
8
0
30 Apr 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Yue Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Yifan Yang
Qi Zhang
Lili Qiu
VLM
120
1
0
30 Apr 2025
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
Woongyeong Yeo
Kangsan Kim
Soyeong Jeong
Jinheon Baek
Sung Ju Hwang
150
1
0
29 Apr 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Jingyun Zhang
Chuanqi Cheng
Yang Liu
Wen Liu
Jian Luan
Rui Yan
70
4
0
28 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
Lav Varshney
117
0
0
27 Apr 2025
Previous
1
2
3
4
5
6
...
12
13
14
Next