ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.16502
  4. Cited By
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
    OSLM
    ELM
    VLM
ArXivPDFHTML

Papers citing "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

50 / 573 papers shown
Title
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yuxiao Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
53
0
0
24 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yuqing Yang
Afshin Dehghan
59
2
0
24 Mar 2025
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
Dawei Yan
Yangfu Li
Qing-Guo Chen
Weihua Luo
Peng Wang
H. Zhang
Chunhua Shen
VGen
VLM
LRM
72
0
0
24 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
52
0
0
24 Mar 2025
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
J. T. Wang
VLM
67
3
0
23 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
45
0
0
22 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
48
0
0
22 Mar 2025
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
Eduard Allakhverdov
Elizaveta Goncharova
Andrey Kuznetsov
42
0
0
20 Mar 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang
Chenghui Lv
Xian Li
Shichao Dong
Huadong Li
Kelu Yao
Chao Li
Wenqi Shao
Ping Luo
62
1
0
19 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
59
0
0
19 Mar 2025
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang
Munan Ning
Zheyuan Liu
Yanbo Wang
Jiayi Ye
Yue Huang
Shuo Yang
Xiao Chen
Y. Song
Li Yuan
LRM
61
0
0
19 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yuyao Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
Tieniu Tan
188
2
0
18 Mar 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Xinyu Fang
Z. Chen
Kai Lan
Lixin Ma
Shengyuan Ding
...
Zicheng Zhang
Guofeng Zhang
Haodong Duan
K. Chen
Dahua Lin
MLLM
66
1
0
18 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
63
0
0
18 Mar 2025
Can Large Vision Language Models Read Maps Like a Human?
Can Large Vision Language Models Read Maps Like a Human?
Shuo Xing
Zezhou Sun
Shuangyu Xie
Kaiyuan Chen
Yanjia Huang
Yuping Wang
Jiachen Li
Dezhen Song
Zhengzhong Tu
68
2
0
18 Mar 2025
Growing a Twig to Accelerate Large Vision-Language Models
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao
Mingyang Wang
Zhou Yu
Wenwen Pan
Yan Yang
Tao Wei
Hao Zhang
Ning Mao
Wei Chen
Jun Yu
VLM
67
1
0
18 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
James Thorne
49
0
0
17 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
63
0
0
17 Mar 2025
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
Cheng Yuan
Ziqiang Liu
Jiashu Lv
Jiawei Shao
Yufei Jiang
Jun Zhang
Xuelong Li
50
1
0
17 Mar 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
Chong Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLM
LRM
76
0
0
17 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu-Xi Cheng
VLM
62
1
0
17 Mar 2025
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
Sung-Yeon Park
Can Cui
Yunsheng Ma
Ahmadreza Moradipari
Rohit Gupta
Kyungtae Han
Ziran Wang
34
0
0
17 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
68
0
0
16 Mar 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Kanzhi Cheng
Wenpo Song
Jiaxin Fan
Zheng Ma
Qiushi Sun
Fangzhi Xu
Chenyang Yan
Nuo Chen
Jianbing Zhang
Jiajun Chen
MLLM
VLM
55
1
0
16 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yuyao Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
92
9
0
16 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
68
11
0
13 Mar 2025
Towards Understanding Graphical Perception in Large Multimodal Models
Kai Zhang
Jianwei Yang
J. Inala
Chandan Singh
Jianfeng Gao
Yu Su
Chenglong Wang
53
1
0
13 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
Jiashi Li
Xiang Yue
Bo Li
Ping Nie
Kai Zou
Wenhu Chen
LRM
79
2
0
13 Mar 2025
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
Ziyu Guo
Ray Zhang
Hao Chen
Jialin Gao
Dongzhi Jiang
Jiaze Wang
Pheng-Ann Heng
56
2
0
13 Mar 2025
VisTW: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan
VisTW: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan
Zhi Rui Tam
Ya-Ting Pai
Yen-Wei Lee
Yun-Nung Chen
CoGe
104
0
0
13 Mar 2025
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
Rui Yang
Lin Song
Yicheng Xiao
Runhui Huang
Yixiao Ge
Ying Shan
Hengshuang Zhao
MLLM
62
0
0
12 Mar 2025
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Luozheng Qin
Zhiyu Tan
Mengping Yang
Xiaomeng Yang
Hao Li
87
0
0
12 Mar 2025
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
Mamba
MLLM
82
1
0
11 Mar 2025
ALLVB: All-in-One Long Video Understanding Benchmark
ALLVB: All-in-One Long Video Understanding Benchmark
Xichen Tan
Yuanjing Luo
Yunfan Ye
Fang Liu
Zhiping Cai
MLLM
VLM
85
0
0
10 Mar 2025
MapQA: Open-domain Geospatial Question Answering on Map Data
Zekun Li
Malcolm Grossman
Eric
Qasemi
Mihir Kulkarni
Muhao Chen
Yao-Yi Chiang
58
1
0
10 Mar 2025
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye
Xianhan Zeng
Fu Li
Chong Li
Haoqi Fan
CoGe
86
0
0
10 Mar 2025
Statistical Study of Sensor Data and Investigation of ML-based Calibration Algorithms for Inexpensive Sensor Modules: Experiments from Cape Point
Statistical Study of Sensor Data and Investigation of ML-based Calibration Algorithms for Inexpensive Sensor Modules: Experiments from Cape Point
Travis Barrett
Amit Kumar Mishra
45
0
0
09 Mar 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
Kaipeng Zhang
ELM
172
0
0
09 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Zhenpeng Chen
Chunwei Wang
Xiuwei Chen
Hang Xu
J. Han
Xiandan Liang
VLM
71
1
0
09 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
75
0
0
08 Mar 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami
Mohammad Akbari
Kevin Cannons
Yong Zhang
65
0
0
07 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
77
0
0
04 Mar 2025
OWLViz: An Open-World Benchmark for Visual Question Answering
T. Nguyen
Dang Nguyen
Hoang Nguyen
Thuan Luong
Long Hoang Dang
Viet Dac Lai
VLM
66
0
0
04 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
58
1
0
04 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
99
2
0
04 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Wenjie Qu
Xiren Zhou
MoE
SyDa
76
28
0
03 Mar 2025
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
Sohan Patnaik
Rishabh Jain
Balaji Krishnamurthy
Mausoom Sarkar
34
0
0
01 Mar 2025
FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts
FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts
Ziyi Zhang
Zhen Sun
Zhe Zhang
Jihui Guo
Xinlei He
AAML
55
2
0
28 Feb 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
P. Wang
Zhongzhi Li
Fei Yin
Dekang Ran
Chenglin Liu
Cheng-Lin Liu
LRM
50
3
0
28 Feb 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
102
8
0
28 Feb 2025
Previous
123456...101112
Next