Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.16502
Cited By
v1
v2
v3
v4 (latest)
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
50 / 700 papers shown
Title
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
Zhongwei Wan
Zhihao Dou
Che Liu
Yu Zhang
Dongfei Cui
...
Yifan Jiang
Yangfan He
Mi Zhang
Shen Yan
Shen Yan
LRM
92
1
0
02 Jun 2025
K12Vista: Exploring the Boundaries of MLLMs in K-12 Education
Chong Li
C. Zhu
Tao Zhang
Mingan Lin
Zenan Zhou
Jian Xie
LRM
56
0
0
02 Jun 2025
Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability
Mengliang He
Jiayi Zeng
Yankai Jiang
Wei Zhang
Zeming Liu
Xiaoming Shi
Aimin Zhou
26
0
0
02 Jun 2025
Is Extending Modality The Right Path Towards Omni-Modality?
Tinghui Zhu
Kai Zhang
Muhao Chen
Yu Su
VLM
54
0
0
02 Jun 2025
Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D
Artemis Panagopoulou
Le Xue
Honglu Zhou
Silvio Savarese
Ran Xu
Caiming Xiong
Chris Callison-Burch
Mark Yatskar
Juan Carlos Niebles
55
0
0
02 Jun 2025
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Xiaojun Shan
Qi Cao
Xing Han
Haofei Yu
Paul Liang
55
0
0
02 Jun 2025
Improve MLLM Benchmark Efficiency through Interview
Farong Wen
Yijin Guo
Junying Wang
Jiaohao Xiao
Yingjie Zhou
Chunyi Li
Zicheng Zhang
Guangtao Zhai
MLLM
38
0
0
01 Jun 2025
GuessBench: Sensemaking Multimodal Creativity in the Wild
Zifeng Zhu
Shangbin Feng
Herun Wan
Ningnan Wang
Minnan Luo
Yulia Tsvetkov
MLLM
CoGe
VLM
84
0
0
01 Jun 2025
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book
Sau Lai Yip
Sunan He
Yuxiang Nie
Shu Pui Chan
Yilin Ye
Sum Ying Lam
Hao-tao Chen
LM&MA
45
0
0
01 Jun 2025
EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
Zekun Wang
Minghua Ma
Zexin Wang
Rongchuan Mu
Liping Shan
Ming Liu
Bing Qin
VLM
34
0
0
31 May 2025
When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways
Kailin Jiang
Yuntao Du
Yukai Ding
Yuchen Ren
Ning Jiang
Zhi Gao
Zilong Zheng
Lei Liu
Bin Li
Qing Li
KELM
51
0
0
30 May 2025
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Gen Luo
Ganlin Yang
Ziyang Gong
Guanzhou Chen
Haonan Duan
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Rongrong Ji
X. Zhu
LM&Ro
39
1
0
30 May 2025
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Junyu Luo
Zhizhuo Kou
Liming Yang
Xiao Luo
Jinsheng Huang
...
Jiaming Ji
Xuanzhe Liu
Sirui Han
Ming Zhang
Yike Guo
24
0
0
30 May 2025
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Yuwen Tan
Yuan Qing
Boqing Gong
49
0
0
30 May 2025
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
128
0
0
30 May 2025
Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework
Can Polat
Hasan Kurban
Erchin Serpedin
Mustafa Kurban
25
0
0
30 May 2025
AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits
Yichen Shi
Ze Zhang
Hongyang Wang
Zhuofu Tao
Zhongyi Li
Bingyu Chen
Yaxin Wang
Zhiping Yu
Ting-Jung Lin
Lei He
28
0
0
30 May 2025
The Road to Generalizable Neuro-Symbolic Learning Should be Paved with Foundation Models
Adam Stein
Aaditya Naik
Neelay Velingker
Mayur Naik
Eric Wong
NAI
AI4CE
31
1
0
30 May 2025
Evaluating Gemini in an arena for learning
LearnLM Team Google
Abhinit Modi
Aditya Srikanth Veerubhotla
Aliya Rysbek
Andrea Huber
...
Theofilos Strinopoulos
Wei-Jen Ko
Yael Gold-Zamir
Yael Haramaty
Yannis Assael
ELM
40
0
0
30 May 2025
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu
Boyun Zheng
Wenting Chen
Zhihao Peng
Zhenfei Yin
Jing Shao
Jiancong Hu
Yixuan Yuan
ELM
86
0
0
29 May 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man
De-An Huang
Guilin Liu
Shiwei Sheng
Shilong Liu
Liang-Yan Gui
Jan Kautz
Yu Wang
Zhiding Yu
MLLM
LRM
76
0
0
29 May 2025
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
Xu Chu
Xinrong Chen
Guanyu Wang
Zhijie Tan
Kui Huang
Wenyu Lv
Tong Mo
Weiping Li
LRM
VLM
85
0
0
29 May 2025
Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation
Zeyu Liu
Zhitian Hou
Yining Di
Kejing Yang
Zhijie Sang
...
Siyuan Liu
Jialu Wang
Chunming Li
Ming Li
Hongxia Yang
LM&MA
LRM
20
0
0
29 May 2025
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Tian Qin
Core Francisco Park
Mujin Kwun
Aaron Walsman
Eran Malach
Nikhil Anand
Hidenori Tanaka
David Alvarez-Melis
ReLM
OffRL
LRM
90
0
0
28 May 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLM
LRM
VLM
108
0
0
28 May 2025
CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction
Jiali Chen
Xusen Hei
HongFei Liu
Yuancheng Wei
Zikun Deng
Jiayuan Xie
Yi Cai
Li Qing
57
0
0
28 May 2025
NegVQA: Can Vision Language Models Understand Negation?
Yuhui Zhang
Yuchang Su
Yiming Liu
Serena Yeung-Levy
MLLM
CoGe
50
0
0
28 May 2025
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Peter Robicheaux
Matvei Popov
Anish Madan
Isaac Robinson
Joseph Nelson
Deva Ramanan
Neehar Peri
ObjD
VLM
111
3
0
27 May 2025
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
Xuanwen Ding
Chengjun Pan
Zejun Li
Jiwen Zhang
Siyuan Wang
Zhongyu Wei
70
0
0
27 May 2025
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response
Junjue Wang
Weihao Xuan
Heli Qi
Zhihao Liu
Kunyi Liu
...
Hongruixuan Chen
Jian Song
J. Xia
Zhuo Zheng
Naoto Yokoya
62
0
0
27 May 2025
Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models
Zesen Lyu
Dandan Zhang
Wei Ye
Fangdi Li
Zhihang Jiang
Yao Yang
ReLM
VLM
LRM
71
0
0
27 May 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
74
2
0
26 May 2025
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Yunlong Tang
Pinxin Liu
Mingqian Feng
Zhangyun Tan
Rui Mao
...
Hang Hua
Ali Vosoughi
Luchuan Song
Zeliang Zhang
Chenliang Xu
LRM
71
1
0
26 May 2025
Can Visual Encoder Learn to See Arrows?
Naoyuki Terashita
Yusuke Tozaki
Hideaki Omote
Congkha Nguyen
Ryosuke Nakamoto
Yuta Koreeda
Hiroaki Ozaki
22
0
0
26 May 2025
ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models
Benjamin Clavié
Florian Brand
VLM
CoGe
64
0
0
25 May 2025
Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning
Shaohao Rui
Kaitao Chen
Weijie Ma
Xiaosong Wang
OffRL
LRM
25
0
0
25 May 2025
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
Kun Xiang
Heng Li
Terry Jingchen Zhang
Yinya Huang
Zirong Liu
...
J. N. Han
Hang Xu
Hanhui Li
Mrinmaya Sachan
Xiaodan Liang
LRM
186
0
0
25 May 2025
MLLMs are Deeply Affected by Modality Bias
Xu Zheng
Chenfei Liao
Yuqian Fu
Kaiyu Lei
Yuanhuiyi Lyu
...
Yu Jiang
N. Sebe
Dacheng Tao
Luc Van Gool
Xuming Hu
80
0
0
24 May 2025
Caption This, Reason That: VLMs Caught in the Middle
Zihan Weng
Lucas Gomez
Taylor Whittington Webb
P. Bashivan
VLM
LRM
50
0
0
24 May 2025
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
Sicheng Feng
Song Wang
Shuyi Ouyang
Lingdong Kong
Zikai Song
Jianke Zhu
Huan Wang
Xinchao Wang
LRM
108
0
0
24 May 2025
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
Ziwei Zhou
Rui Wang
Zuxuan Wu
AuLLM
VGen
82
0
0
23 May 2025
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
196
0
0
23 May 2025
Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion
Jacob A. Hansen
Wei Lin
Junmo Kang
M. Jehanzeb Mirza
Hongyin Luo
Rogerio Feris
Alan Ritter
James R. Glass
Leonid Karlinsky
VLM
258
0
0
23 May 2025
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Donghwan Chi
Hyomin Kim
Yoonjin Oh
Yongjin Kim
Donghoon Lee
DaeJin Jo
Jongmin Kim
Junyeob Baek
Sungjin Ahn
Sungwoong Kim
MLLM
VLM
488
0
0
23 May 2025
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma
Linge Du
Xuyang Shen
Shaoxiang Chen
Pengfei Li
Qibing Ren
Lizhuang Ma
Yuchao Dai
Pengfei Liu
Junjie Yan
OffRL
LRM
137
0
0
23 May 2025
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu
Sophia Ananiadou
MoMe
KELM
CLL
109
0
0
22 May 2025
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Zebin You
Shen Nie
Xiaolu Zhang
Jun Hu
Jun Zhou
Zhiwu Lu
J. Wen
Chongxuan Li
MLLM
VLM
112
2
0
22 May 2025
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
119
2
0
22 May 2025
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Chenhao Zhang
Yazhe Niu
118
0
0
22 May 2025
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
Meng-Hao Guo
Xuanyu Chu
Qianrui Yang
Zhe-Han Mo
Yiqing Shen
...
Kiyohiro Nakayama
Zhengyang Geng
Houwen Peng
Han Hu
Shi-Min Hu
LRM
197
0
0
22 May 2025
Previous
1
2
3
4
5
...
12
13
14
Next