ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.14233
  4. Cited By
VCoder: Versatile Vision Encoders for Multimodal Large Language Models

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

21 December 2023
Jitesh Jain
Jianwei Yang
Humphrey Shi
    MLLM
ArXivPDFHTML

Papers citing "VCoder: Versatile Vision Encoders for Multimodal Large Language Models"

23 / 23 papers shown
Title
Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs
Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs
Dung Tien Nguyen
Minh Khoi Ho
Huy Ta
T. Nguyen
Qi Chen
...
Zhibin Liao
Minh Nguyen Nhat To
Johan W. Verjans
Phi Le Nguyen
Vu Minh Hieu Phan
29
0
0
30 Apr 2025
FaceInsight: A Multimodal Large Language Model for Face Perception
FaceInsight: A Multimodal Large Language Model for Face Perception
Jingzhi Li
Changjiang Luo
Ruoyu Chen
Hua Zhang
Wenqi Ren
Jianhou Gan
Xiaochun Cao
CVBM
LRM
65
0
0
22 Apr 2025
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large
  Language Models
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models
Yeyuan Wang
D. Gao
Bin Li
Rujiao Long
Lei Yi
Xiaoyan Cai
Libin Yang
Jinxia Zhang
Shanqing Yu
Qi Xuan
78
1
0
22 Dec 2024
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Yangning Li
Hai-Tao Zheng
Xinyu Wang
Yong Jiang
Zhen Zhang
...
Hui Wang
Hai-Tao Zheng
Pengjun Xie
Philip S. Yu
Fei Huang
65
16
0
05 Nov 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
72
4
0
14 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
84
26
0
04 Oct 2024
MediConfusion: Can you trust your AI radiologist? Probing the
  reliability of multimodal medical foundation models
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri
Zalan Fabian
Maryam Soltanolkotabi
Mahdi Soltanolkotabi
MedIm
48
3
0
23 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
39
1
0
19 Sep 2024
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing
  Hallucinations in LVLMs
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
Yassine Ouali
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
VLM
MLLM
32
18
0
19 Aug 2024
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Danfeng Guo
Sumitaka Honji
LRM
64
0
0
31 Jul 2024
Reflective Instruction Tuning: Mitigating Hallucinations in Large
  Vision-Language Models
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang
Teng Wang
Haigang Zhang
Ping Lu
Feng Zheng
MLLM
LRM
VLM
34
3
0
16 Jul 2024
SpatialBot: Precise Spatial Understanding with Vision Language Models
SpatialBot: Precise Spatial Understanding with Vision Language Models
Wenxiao Cai
Yaroslav Ponomarenko
Jianhao Yuan
Xiaoqi Li
Wankou Yang
Hao Dong
Bo Zhao
VLM
56
30
0
19 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
35
6
0
05 Jun 2024
Don't Miss the Forest for the Trees: Attentional Vision Calibration for
  Large Vision Language Models
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
Sangmin Woo
Donguk Kim
Jaehyuk Jang
Yubin Choi
Changick Kim
42
12
0
28 May 2024
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Jiachen Li
Xinyao Wang
Sijie Zhu
Chia-Wen Kuo
Lu Xu
Fan Chen
Jitesh Jain
Humphrey Shi
Longyin Wen
MLLM
MoE
46
26
0
09 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
95
139
0
29 Apr 2024
A Survey on Hallucination in Large Vision-Language Models
A Survey on Hallucination in Large Vision-Language Models
Hanchao Liu
Wenyuan Xue
Yifei Chen
Dapeng Chen
Xiutian Zhao
Ke Wang
Liping Hou
Rong-Zhi Li
Wei Peng
LRM
MLLM
32
115
0
01 Feb 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
287
4,261
0
30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
392
4,154
0
28 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
226
226
0
20 Jan 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
296
1,084
0
17 Feb 2021
Neural Baby Talk
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
200
434
0
27 Mar 2018
1