Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.04257
Cited By
v1
v2 (latest)
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
7 November 2023
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration"
50 / 109 papers shown
Title
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao
Xuxin Cheng
Zhiqi Huang
Lei Li
165
2
0
01 Jul 2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
Tongtian Yue
Longteng Guo
Yepeng Tang
Zijia Zhao
Xinxin Zhu
Hua Huang
Jing Liu
MLLM
VLM
21
0
0
20 Jun 2025
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Shaolei Zhang
Shoutao Guo
Qingkai Fang
Yan Zhou
Yang Feng
MLLM
AuLLM
VLM
72
0
0
16 Jun 2025
RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis
Pengzuo Wu
Yuhang Yang
Guangcheng Zhu
Chao Ye
Hong Gu
...
Y. He
Liangyu Zha
Wentao Ye
Junbo Zhao
Haobo Wang
LMTD
18
0
0
16 Jun 2025
Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Lexiang Tang
Xianwei Zhuang
Bang Yang
Zhiyuan Hu
Hongxiang Li
Lu Ma
Jinghan Ru
Yuexian Zou
35
0
0
14 Jun 2025
Revolutionizing Clinical Trials: A Manifesto for AI-Driven Transformation
M. Schaar
Richard W. Peck
E. McKinney
Jim Weatherall
Stuart Bailey
...
Rafik Salama
Christina Gunther
Francesca Frau
Antoine Pugeat
Ramon Hernandez
MedIm
71
6
0
10 Jun 2025
Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline
Brian Gordon
Yonatan Bitton
Andreea Marzoca
Yasumasa Onoe
Xiao Wang
Daniel Cohen-Or
Idan Szpektor
CoGe
28
0
0
09 Jun 2025
Synthetic Visual Genome
J. S. Park
Zixian Ma
Linjie Li
Chenhao Zheng
Cheng-Yu Hsieh
...
Quan Kong
Norimasa Kobori
Ali Farhadi
Yejin Choi
Ranjay Krishna
30
0
0
09 Jun 2025
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
Liangliang You
Junchi Yao
Shu Yang
Guimin Hu
Lijie Hu
Di Wang
MLLM
24
0
0
08 Jun 2025
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
Jiahui Wang
Z. Liu
Yongming Rao
Jiwen Lu
VLM
LRM
191
0
0
05 Jun 2025
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning
Daeun Lee
Jaehong Yoon
Jaemin Cho
Mohit Bansal
LRM
94
0
0
04 Jun 2025
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
Yicheng Xiao
Lin Song
Rui Yang
Cheng Cheng
Zunnan Xu
Zhaoyang Zhang
Yixiao Ge
Xiu Li
Ying Shan
66
2
0
03 Jun 2025
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
Hao Yan
Handong Zheng
Hao Wang
Liang Yin
Xingchen Liu
...
Minghui Liao
Chao Weng
Wei Chen
Yuliang Liu
Xiang Bai
LRM
58
0
0
03 Jun 2025
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
Youze Wang
Wenbo Hu
Yinpeng Dong
Jing Liu
Hanwang Zhang
Richang Hong
71
2
0
02 Jun 2025
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Siyuan Li
Juanxi Tian
Zedong Wang
Xin Jin
Zicheng Liu
Wentao Zhang
Dan Xu
50
0
0
01 Jun 2025
Test-time Vocabulary Adaptation for Language-driven Object Detection
Mingxuan Liu
Tyler L. Hayes
Massimiliano Mancini
Elisa Ricci
Riccardo Volpi
G. Csurka
ObjD
TTA
VLM
55
0
0
31 May 2025
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
Shuhao Han
Haotian Fan
Fangyuan Kong
Wenjie Liao
Chunle Guo
...
Jian Guo
Zhizhuo Shao
Ziyu Feng
Bing Li
Weiming Hu
198
11
0
22 May 2025
Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement
Kexin Zhang
Junlan Chen
Daifeng Li
Yuxuan Zhang
Yangyang Feng
Bowen Deng
Weixu Chen
LRM
79
0
0
22 May 2025
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
Feilong Tang
Chengzhi Liu
Zhongxing Xu
Ming Hu
Zelin Peng
...
Minquan Lin
Yifan Peng
Xuelian Cheng
Imran Razzak
Zongyuan Ge
80
1
0
22 May 2025
Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Junyang Wang
Haiyang Xu
Xi Zhang
Ming Yan
Ji Zhang
Fei Huang
Jitao Sang
131
0
0
20 May 2025
Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models
Kai Tang
Jinhao You
Xiuqi Ge
Hanze Li
Yichen Guo
Xiande Huang
MLLM
175
0
0
18 May 2025
ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing
Xuanle Zhao
Xuexin Liu
Haoyue Yang
Xianzhen Luo
Fanhu Zeng
Jianling Li
Qi Shi
Chi Chen
105
5
0
17 May 2025
Diverging Towards Hallucination: Detection of Failures in Vision-Language Models via Multi-token Aggregation
Geigh Zollicoffer
Minh Vu
Manish Bhattarai
VLM
94
0
0
16 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
111
1
0
03 May 2025
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Yikun Ji
Y. Hong
Jiahui Zhan
H. Chen
Jun Lan
Huijia Zhu
Weiqiang Wang
Lefei Zhang
Jianfu Zhang
MLLM
LRM
117
0
0
19 Apr 2025
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
Liangyu Xu
Yingxiu Zhao
Jiadong Wang
Yingyao Wang
Bu Pi
...
Jihao Gu
Xinfeng Li
Xiaoyong Zhu
Jun Song
Jian Xu
LRM
512
6
0
17 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Mengli Zhu
Weidong Zhang
...
Pengfei Li
Chong Li
Junhao Zhu
Hao Yang
Shiliang Sun
122
2
0
16 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
233
132
1
14 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
165
5
0
03 Apr 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
75
0
0
03 Apr 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Weinan Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
197
9
0
27 Mar 2025
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
Ling Zhong
Yujing Lu
Jing Yang
Weiming Li
Peng Wei
Yongheng Wang
Manni Duan
Qing Zhang
158
2
0
25 Mar 2025
Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models
Yize Zhang
Chunwang Zou
Bo Wang
Jing Qin
145
0
0
24 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu Cheng
VLM
181
1
0
17 Mar 2025
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan
Fei Kong
Hao-Ran Cheng
James Diffenderfer
B. Kailkhura
Lichao Sun
Xiaofeng Zhu
Xiaoshuang Shi
Kaidi Xu
484
4
0
13 Mar 2025
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model
Mingxing Li
Rui Wang
Lei Sun
Y. Bai
Xiangxiang Chu
102
0
0
08 Mar 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Wei Li
Bing Hu
Rui Shao
Leyang Shen
Liqiang Nie
106
4
0
05 Mar 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Dinesh Manocha
MoE
150
0
0
27 Feb 2025
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
Linjie Mu
Zhongzhen Huang
Shengqian Qin
Yakun Zhu
Shanghang Zhang
Xiaofan Zhang
108
1
0
17 Feb 2025
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
Shintaro Ozaki
Kazuki Hayashi
Yusuke Sakai
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
LRM
150
1
0
17 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhiyong Yang
Mike Zheng Shou
MoE
208
1
0
10 Feb 2025
Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails
Yijun Yang
L. Wang
Xiao Yang
Lanqing Hong
Jun Zhu
AAML
77
0
0
09 Feb 2025
A Hybrid Swarm Intelligence Approach for Optimizing Multimodal Large Language Models Deployment in Edge-Cloud-based Federated Learning Environments
Gaith Rjouba
Hanae Elmekki
Saidul Islam
Jamal Bentahar
Rachida Dssouli
98
1
0
04 Feb 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
Shivalika Singh
Nakul Sharma
Manish Gupta
Anand Mishra
143
1
0
28 Jan 2025
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
Jiaxing Zhao
Q. Yang
Yixing Peng
Detao Bai
Shimin Yao
...
Xiang Chen
Shenghao Fu
Weixuan chen
Xihan Wei
Liefeng Bo
VGen
AuLLM
96
6
0
28 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
172
43
0
07 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
179
15
0
06 Jan 2025
HarmonyIQA: Pioneering Benchmark and Model for Image Harmonization Quality Assessment
Zitong Xu
Huiyu Duan
Guangji Ma
Liu Yang
Jiarui Wang
Qingbo Wu
Xiongkuo Min
Guangtao Zhai
P. Callet
86
3
0
03 Jan 2025
M
3
^3
3
oralBench: A MultiModal Moral Benchmark for LVLMs
Bei Yan
Jie M. Zhang
Zhiyuan Chen
Shiguang Shan
Xilin Chen
ELM
95
3
0
31 Dec 2024
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang
Ziwei Zheng
Boxu Chen
Zhengyu Zhao
Chenhao Lin
Chao Shen
VLM
320
7
0
18 Dec 2024
1
2
3
Next