Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.01392
Cited By
Learning Visual Representations with Caption Annotations
4 August 2020
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLM
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Visual Representations with Caption Annotations"
50 / 57 papers shown
Title
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
1
0
17 Apr 2025
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
A new approach for encoding code and assisting code understanding
Mengdan Fan
Changde Du
Haiyan Zhao
Zhi Jin
46
0
0
01 Aug 2024
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Chenguang Wang
Ruoxi Jia
Xin Liu
Dawn Song
VLM
29
7
0
15 Mar 2024
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
Antonín Vobecký
Oriane Siméoni
David Hurych
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Josef Sivic
40
33
0
17 Jan 2024
From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion
Xingyuan Li
Yang Zou
Jinyuan Liu
Zhiying Jiang
Long Ma
Xin-Yue Fan
Risheng Liu
51
4
0
31 Dec 2023
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang
R. Prabha
Tianyuan Huang
Jiajun Wu
Ram Rajagopal
34
55
0
20 Dec 2023
Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models
Fei Pan
Sangryul Jeon
Brian Wang
Frank Mckenna
Stella X. Yu
44
2
0
19 Dec 2023
RECLIP: Resource-efficient CLIP by Training with Small Images
Runze Li
Dahun Kim
B. Bhanu
Weicheng Kuo
VLM
CLIP
36
13
0
12 Apr 2023
CUDA: Convolution-based Unlearnable Datasets
Vinu Sankar Sadasivan
Mahdi Soltanolkotabi
S. Feizi
MU
29
25
0
07 Mar 2023
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSL
VLM
21
28
0
23 Feb 2023
Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation
Bingqian Lin
Yi Zhu
Xiaodan Liang
Liang Lin
Jian-zhuo Liu
CoGe
LM&Ro
41
3
0
13 Feb 2023
Advancing Radiograph Representation Learning with Masked Record Modeling
Hong-Yu Zhou
Chenyu Lian
Lian-cheng Wang
Yizhou Yu
MedIm
38
55
0
30 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
43
11
0
17 Jan 2023
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata
Chenhao Zheng
Ayush Shrivastava
Andrew Owens
VLM
33
11
0
11 Jan 2023
The Role of Local Alignment and Uniformity in Image-Text Contrastive Learning on Medical Images
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
24
7
0
14 Nov 2022
MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose
Rajat Hebbar
Krishna Somandepalli
Haoyang Zhang
Huayu Chen
K. Cole-McLaughlin
Haoran Wang
Shrikanth Narayanan
CLIP
22
21
0
20 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
32
7
0
19 Oct 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang
Zhenbang Wu
Dinesh Agarwal
Jimeng Sun
CLIP
VLM
MedIm
49
399
0
18 Oct 2022
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Weicheng Kuo
Huayu Chen
Xiuye Gu
A. Piergiovanni
A. Angelova
MLLM
VLM
ObjD
51
134
0
30 Sep 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
27
16
0
01 Jul 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
30
124
0
15 Jun 2022
INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla
Shivam Chandhok
Milan Aggarwal
V. Balasubramanian
Balaji Krishnamurthy
VLM
41
2
0
13 Jun 2022
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)
Alex Fang
Gabriel Ilharco
Mitchell Wortsman
Yu Wan
Vaishaal Shankar
Achal Dave
Ludwig Schmidt
VLM
OOD
33
139
0
03 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
98
6,650
0
13 Apr 2022
Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Bin Xiao
Ce Liu
Lu Yuan
Jianfeng Gao
VLM
SSL
34
221
0
07 Apr 2022
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Mohamed Afham
Isuru Dissanayake
Dinithi Dissanayake
Amaya Dharmasiri
Kanchana Thilakarathna
Ranga Rodrigo
3DPC
16
251
0
01 Mar 2022
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLM
CLIP
60
479
0
23 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
40
555
0
16 Dec 2021
CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision
A. Shrivastava
Ramprasaath R. Selvaraju
Nikhil Naik
Vicente Ordonez
VLM
CLIP
30
6
0
14 Dec 2021
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
Rameen Abdal
Peihao Zhu
John C. Femiani
Niloy J. Mitra
Peter Wonka
CLIP
39
103
0
09 Dec 2021
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples
Nir Zabari
Yedid Hoshen
VLM
33
26
0
06 Dec 2021
Joint Learning of Localized Representations from Medical Images and Reports
Philipp Muller
Georgios Kaissis
Cong Zou
Daniel Munich
140
81
0
06 Dec 2021
Extract Free Dense Labels from CLIP
Chong Zhou
Chen Change Loy
Bo Dai
VLM
CLIP
48
455
0
02 Dec 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
25
20
0
30 Nov 2021
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
22
162
0
22 Nov 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLM
CLIP
35
446
0
11 Oct 2021
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Aditya Sanghi
Hang Chu
Joseph G. Lambourne
Ye Wang
Chin-Yi Cheng
Marco Fumero
Kamal Rahimi Malekshan
CLIP
45
289
0
06 Oct 2021
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLM
SSL
54
10
0
24 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
42
77
0
06 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
30
18
0
04 Sep 2021
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
64
691
0
04 Sep 2021
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
Zhijian Liu
Simon Stent
Jie Li
John Gideon
Song Han
VLM
25
10
0
26 Aug 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
202
405
0
13 Jul 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
18
274
0
09 Jun 2021
Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
Rui Cheng
Bichen Wu
Peizhao Zhang
Peter Vajda
Joseph E. Gonzalez
CLIP
VLM
21
31
0
18 Apr 2021
Exploring Visual Engagement Signals for Representation Learning
Menglin Jia
Zuxuan Wu
A. Reiter
Claire Cardie
Serge Belongie
Ser-Nam Lim
21
13
0
15 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
45
271
0
07 Apr 2021
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays
Xiaosong Wang
Ziyue Xu
Leo K. Tam
Dong Yang
Daguang Xu
ViT
MedIm
22
23
0
30 Mar 2021
1
2
Next