ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.06167
  4. Cited By
Enhancing Multimodal Understanding with CLIP-Based Image-to-Text
  Transformation

Enhancing Multimodal Understanding with CLIP-Based Image-to-Text Transformation

2 January 2024
Change Che
Qunwei Lin
Xinyu Zhao
Jiaxin Huang
Liqiang Yu
    VLM
ArXivPDFHTML

Papers citing "Enhancing Multimodal Understanding with CLIP-Based Image-to-Text Transformation"

4 / 4 papers shown
Title
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
197
3,659
0
06 Aug 2019
Zero-Shot Learning -- The Good, the Bad and the Ugly
Zero-Shot Learning -- The Good, the Bad and the Ugly
Yongqin Xian
Bernt Schiele
Zeynep Akata
55
837
0
13 Mar 2017
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
109
1,165
0
24 Nov 2015
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
227
4,451
0
20 Nov 2014
1