Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.08911
Cited By
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
14 May 2024
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Oncel Tuzel
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLIP with Quality Captions: A Strong Pretraining for Vision Tasks"
12 / 12 papers shown
Title
Multilingual Vision-Language Pre-training for the Remote Sensing Domain
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
CLIP
VLM
42
1
0
30 Oct 2024
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
Qingqing Cao
Mahyar Najibi
Sachin Mehta
CLIP
DiffM
35
1
0
15 Oct 2024
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
N. V. Noord
Marcel Worring
Cees G. M. Snoek
VLM
46
3
0
13 Oct 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
58
4
0
09 Jul 2024
Slight Corruption in Pre-training Data Makes Better Diffusion Models
Hao Chen
Yujin Han
Diganta Misra
Xiang Li
Kai Hu
Difan Zou
Masashi Sugiyama
Jindong Wang
Bhiksha Raj
DiffM
47
5
0
30 May 2024
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
113
63
0
23 Mar 2023
Exploring Target Representations for Masked Autoencoders
Xingbin Liu
Jinghao Zhou
Tao Kong
Xianming Lin
Rongrong Ji
79
50
0
08 Sep 2022
Generalization in Neural Networks: A Broad Survey
Chris Rohlfs
OOD
AI4CE
11
6
0
04 Sep 2022
Revealing the Dark Secrets of Masked Image Modeling
Zhenda Xie
Zigang Geng
Jingcheng Hu
Zheng-Wei Zhang
Han Hu
Yue Cao
VLM
186
105
0
26 May 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
311
5,773
0
29 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
1