Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.06281
Cited By
MMBench: Is Your Multi-modal Model an All-around Player?
12 July 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
Wangbo Zhao
Yike Yuan
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MMBench: Is Your Multi-modal Model an All-around Player?"
50 / 139 papers shown
Title
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
112
0
0
25 May 2025
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Ye Mo
Zirui Shao
Kai Ye
Xianwei Mao
Bo Zhang
...
Gang Huang
Kehan Chen
Zhou Huan
Zixu Yan
Sheng Zhou
LRM
36
0
0
24 May 2025
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
34
0
0
22 May 2025
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
Penghao Wu
Lewei Lu
Ziwei Liu
59
0
0
21 May 2025
ModRWKV: Transformer Multimodality in Linear Time
Jiale Kang
Ziyin Yue
Qingyu Yin
Jiang Rui
W. Li
Zening Lu
Zhouran Ji
OffRL
42
0
0
20 May 2025
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Bonan li
Zicheng Zhang
Songhua Liu
Weihao Yu
Xinchao Wang
VLM
69
0
0
17 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
101
0
0
08 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
118
1
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
189
0
0
05 May 2025
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Yiming Lei
Chenkai Zhang
Ziqiang Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
AI4TS
79
0
0
30 Apr 2025
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
Siqi Li
Yufan Shen
Xiangnan Chen
Jiayi Chen
Hengwei Ju
...
Botian Shi
Y. Liu
Xinyu Cai
Yu Qiao
Yu Qiao
VLM
ELM
120
1
0
30 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
Xuzhao Li
MLLM
155
0
0
29 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
149
1
0
28 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Zehao Wang
Senthil Purushwalkam
Caiming Xiong
Siyang Song
Chenhui Xu
Ran Xu
96
2
0
23 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
102
56
1
14 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
81
5
0
10 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Xinze Wang
Zhiyong Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
LRM
VLM
122
12
0
10 Apr 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
118
0
0
29 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Wentao Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
355
5
0
27 Mar 2025
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
Dongchen Lu
Yuyao Sun
Zilu Zhang
Leping Huang
Jianliang Zeng
Mao Shu
Huo Cao
76
1
0
27 Mar 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Yichao Yan
VLM
131
1
0
26 Mar 2025
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Hao Ai
Kunyi Wang
Zezhou Wang
H. Lu
Jin Tian
Yaxin Luo
Peng-Fei Xing
Jen-Yuan Huang
Huaxia Li
Gen Luo
MLLM
VLM
129
0
0
26 Mar 2025
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
Mehdi Moshtaghi
Siavash H. Khajavi
Joni Pajarinen
VLM
87
0
0
25 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zhengyang Liang
Ao Li
Yang Tian
Bo Zhao
VGen
VLM
166
3
0
24 Mar 2025
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yuxiao Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
133
1
0
24 Mar 2025
A Vision Centric Remote Sensing Benchmark
Abduljaleel Adejumo
Faegheh Yeganli
Clifford Broni-bediako
Aoran Xiao
Naoto Yokoya
Mennatullah Siam
88
0
0
20 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu Cheng
VLM
83
1
0
17 Mar 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
85
2
0
13 Mar 2025
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang
Quan Cui
Bingchen Zhao
Cheng Yang
MLLM
SyDa
68
1
0
11 Mar 2025
Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
Jikang Cheng
113
1
0
11 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
102
3
0
10 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
366
1
0
10 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Zhenpeng Chen
Chunwei Wang
Xiuwei Chen
Hongbin Xu
Jiawei Han
Xiandan Liang
J. N. Han
Hang Xu
Xiaodan Liang
VLM
90
1
0
09 Mar 2025
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
Shwai He
Weilin Cai
Jiayi Huang
Ang Li
MoE
108
2
0
07 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
127
0
0
04 Mar 2025
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
128
1
0
24 Feb 2025
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing
Yi-Kai Zhang
De-Chuan Zhan
Han-Jia Ye
ALM
ELM
LRM
123
3
0
24 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
103
8
0
21 Feb 2025
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao
Wenqi Pei
Yifei Tao
Haiyang Mei
Mike Zheng Shou
89
0
0
20 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
125
1
0
18 Feb 2025
Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Junda Wu
Yuxin Xiong
Xintong Li
Yu Xia
Ruoyu Wang
...
Sungchul Kim
Ryan Rossi
Lina Yao
Jingbo Shang
Julian McAuley
CLL
VLM
84
0
0
17 Feb 2025
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
Shiryu Ueno
Yoshikazu Hayashi
Shunsuke Nakatsuka
Yusei Yamada
Hiroaki Aizawa
K. Kato
MLLM
VLM
136
0
0
13 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
190
6
0
12 Feb 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Yi Li
Yuquan Deng
Jing Zhang
Joel Jang
Marius Memme
...
Fabio Ramos
Dieter Fox
Anqi Li
Abhishek Gupta
Ankit Goyal
LM&Ro
122
12
0
08 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
147
17
0
05 Feb 2025
PixelWorld: Towards Perceiving Everything as Pixels
Zhiheng Lyu
Xueguang Ma
Wenhu Chen
182
1
0
31 Jan 2025
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Bin Zhu
Hui yan Qi
Yinxuan Gui
Jingjing Chen
Chong-Wah Ngo
Ee-Peng Lim
314
1
0
31 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Wentao Zhang
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
144
21
0
21 Jan 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Ziyang Chen
Mingxiao Li
Zhongfu Chen
Nan Du
Xiaolong Li
Yuexian Zou
76
1
0
19 Jan 2025
Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces
Amirreza Payandeh
Daeun Song
Mohammad Nazeri
Jing Liang
Praneel Mukherjee
Amir Hossain Raj
Yangzhe Kong
Dinesh Manocha
Xuesu Xiao
LM&Ro
LRM
138
5
0
17 Jan 2025
1
2
3
Next