ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05525
  4. Cited By
DeepSeek-VL: Towards Real-World Vision-Language Understanding
v1v2 (latest)

DeepSeek-VL: Towards Real-World Vision-Language Understanding

8 March 2024
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
Bo Liu
Jingxiang Sun
Zhaolin Ren
Zhuoshu Li
Hao Yang
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
    VLM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-VL: Towards Real-World Vision-Language Understanding"

9 / 109 papers shown
Title
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
Yuhang Wu
Wenmeng Yu
Yean Cheng
Yan Wang
Xiaohan Zhang
Jiazheng Xu
Ming Ding
Yuxiao Dong
102
2
0
13 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Yangfu Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
142
12
0
04 Jun 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRMMLLM
183
17
0
27 May 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi-dong Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Hao Liu
Xiang Bai
Can Huang
Xiang Bai
Can Huang
185
28
0
20 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLMVLM
157
644
0
25 Apr 2024
From Image to Video, what do we need in multimodal LLMs?
From Image to Video, what do we need in multimodal LLMs?
Suyuan Huang
Haoxin Zhang
Yan Gao
Honggu Chen
Yan Gao
Yao Hu
Zhan Qin
VLM
110
8
0
18 Apr 2024
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary
  Instance Segmentation
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
Jiahao Xie
Wei Li
Xiangtai Li
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
DiffMVLM
154
37
0
22 Sep 2023
Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its
  Applications, Advantages, Limitations, and Future Directions in Natural
  Language Processing
Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing
Walid Hariri
AI4MHLM&MA
169
94
0
27 Mar 2023
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
Yu-Chung Hsiao
Fedir Zubach
Maria Wang
Jindong Chen
Victor Carbune
Jason Lin
Maria Wang
Yun Zhu
Jindong Chen
RALM
207
30
0
16 Sep 2022
Previous
123