ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.09778
  4. Cited By
I Can't Believe There's No Images! Learning Visual Tasks Using only
  Language Supervision

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

17 November 2022
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
    VLM
ArXivPDFHTML

Papers citing "I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision"

23 / 23 papers shown
Title
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Xiaomin Yu
Pengxiang Ding
Wenjie Zhang
Siteng Huang
Songyang Gao
Chengwei Qin
Kejian Wu
Zhaoxin Fan
Ziyue Qiao
Donglin Wang
MLLM
SyDa
72
0
0
28 Mar 2025
Debiasing Vison-Language Models with Text-Only Training
Debiasing Vison-Language Models with Text-Only Training
Yunfan Yang
Chaoquan Jiang
Zhiyu Lin
Jinlin Xiao
Jiaming Zhang
Jitao Sang
VLM
28
1
0
12 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for
  Zero-shot Captioning
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIP
VLM
31
0
0
26 Sep 2024
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer
  from Text to Image via CLIP Inversion
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer
Kyra Ahrens
Stefan Wermter
CLIP
VLM
27
3
0
15 Jul 2024
ADAPT: Multimodal Learning for Detecting Physiological Changes under
  Missing Modalities
ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities
Julie Mordacq
Léo Milecki
Maria Vakalopoulou
Steve Oudot
Vicky Kalogeiton
OffRL
MedIm
37
3
0
04 Jul 2024
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image
  Retrieval
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval
Lorenzo Agnolucci
Alberto Baldrati
Marco Bertini
A. Bimbo
38
10
0
05 May 2024
Improving Medical Multi-modal Contrastive Learning with Expert
  Annotations
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Yogesh Kumar
Pekka Marttinen
MedIm
VLM
31
10
0
15 Mar 2024
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image
  Captioning
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
27
10
0
14 Dec 2023
Language-only Efficient Training of Zero-shot Composed Image Retrieval
Language-only Efficient Training of Zero-shot Composed Image Retrieval
Geonmo Gu
Sanghyuk Chun
Wonjae Kim
Yoohoon Kang
Sangdoo Yun
26
14
0
04 Dec 2023
TLDR: Text Based Last-layer Retraining for Debiasing Image Classifiers
TLDR: Text Based Last-layer Retraining for Debiasing Image Classifiers
Juhyeon Park
Seokhyeon Jeong
Taesup Moon
35
1
0
30 Nov 2023
Image Captioning with Multi-Context Synthetic Data
Image Captioning with Multi-Context Synthetic Data
Feipeng Ma
Y. Zhou
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
DiffM
25
7
0
29 May 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
45
86
0
06 Mar 2023
Text-Only Training for Image Captioning using Noise-Injected CLIP
Text-Only Training for Image Captioning using Noise-Injected CLIP
David Nukrai
Ron Mokady
Amir Globerson
VLM
CLIP
60
94
0
01 Nov 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
J. Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
115
36
0
25 May 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
259
558
0
28 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
196
405
0
13 Jul 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
225
899
0
28 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
317
780
0
18 Apr 2021
A Straightforward Framework For Video Retrieval Using CLIP
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
321
117
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
304
3,708
0
11 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
256
525
0
04 Feb 2021
Deep Domain-Adversarial Image Generation for Domain Generalisation
Deep Domain-Adversarial Image Generation for Domain Generalisation
Kaiyang Zhou
Yongxin Yang
Timothy M. Hospedales
Tao Xiang
OOD
215
404
0
12 Mar 2020
1