Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.10302
Cited By
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
13 December 2024
Z. F. Wu
Xiaokang Chen
Zizheng Pan
Xianglong Liu
Wen Liu
Damai Dai
Huazuo Gao
Yiyang Ma
Chengyue Wu
Bingxuan Wang
Zhenda Xie
Yu-Huan Wu
Kai Hu
Jiawei Wang
Yaofeng Sun
Yukun Li
Yishi Piao
Kang Guan
Aixin Liu
Xin Xie
Yuxiang You
Kai Dong
Xingkai Yu
Haowei Zhang
Liang Zhao
Yijiao Wang
Chong Ruan
MLLM
VLM
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding"
21 / 21 papers shown
Title
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Muzhi Zhu
Hao Zhong
Canyu Zhao
Zongze Du
Zheng Huang
...
Hao Chen
Cheng Zou
Jingdong Chen
Ming-Hsuan Yang
Chunhua Shen
LRM
87
0
0
27 May 2025
InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts
Minzhi Lin
Tianchi Xie
Mengchen Liu
Yilin Ye
C. L. Philip Chen
Shixia Liu
26
0
0
25 May 2025
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
Meng-Hao Guo
Xuanyu Chu
Qianrui Yang
Zhe-Han Mo
Yiqing Shen
...
Kiyohiro Nakayama
Zhengyang Geng
Houwen Peng
Han Hu
Shi-Min Hu
LRM
99
0
0
22 May 2025
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Chenhao Zhang
Yazhe Niu
59
0
0
22 May 2025
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
Shuhao Han
Haotian Fan
Fangyuan Kong
Wenjie Liao
Chunle Guo
...
Jian Guo
Zhizhuo Shao
Ziyu Feng
Bing Li
Weiming Hu
105
6
0
22 May 2025
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu
Sophia Ananiadou
MoMe
KELM
CLL
57
0
0
22 May 2025
Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Junyang Wang
Haiyang Xu
Xi Zhang
Ming Yan
Ji Zhang
Fei Huang
Jitao Sang
47
0
0
20 May 2025
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Maoyuan Ye
Jing Zhang
Juhua Liu
Bo Du
Dacheng Tao
LRM
89
0
0
18 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
99
0
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
179
0
0
05 May 2025
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang
Wenliang Zheng
Aashrith Madasu
Peng Shi
Ryo Kamoi
...
Ranran Haoran Zhang
Avitej Iyer
Renze Lou
Wenpeng Yin
Rui Zhang
179
0
0
25 Apr 2025
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
Xinsong Zhang
Yarong Zeng
Xinting Huang
Hu Hu
Runquan Xie
Han Hu
Zhanhui Kang
MLLM
VLM
139
1
0
17 Apr 2025
HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression
Chen Zhang
Bo Hu
Weidong Chen
Zhendong Mao
366
0
0
14 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
94
0
0
02 Apr 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
107
0
0
13 Mar 2025
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
373
0
0
11 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
74
1
0
04 Mar 2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing
Yuping Wang
Peiran Li
Ruizheng Bai
Yansen Wang
Chan-wei Hu
Chengxuan Qian
Huaxiu Yao
Zhengzhong Tu
135
6
0
18 Feb 2025
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
D. Song
Sicheng Lai
Shunian Chen
Lichao Sun
Benyou Wang
360
0
0
06 Nov 2024
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
Ziyue Wang
Chi Chen
Ziyue Wang
Yurui Dong
Yuanchi Zhang
Yuzhuang Xu
Xiaolong Wang
Ziwei Sun
Yang Liu
LRM
57
3
0
07 Oct 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
125
167
0
29 Apr 2024
1