Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.05918
Cited By
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
11 February 2021
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision"
50 / 794 papers shown
Title
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
21
2
0
01 Nov 2022
A simple, efficient and scalable contrastive masked autoencoder for learning visual representations
Shlok Kumar Mishra
Joshua Robinson
Huiwen Chang
David Jacobs
Aaron Sarna
Aaron Maschinot
Dilip Krishnan
DiffM
43
30
0
30 Oct 2022
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
Chaofan Ma
Yu-Hao Yang
Yanfeng Wang
Ya-Qin Zhang
Weidi Xie
VLM
26
48
0
27 Oct 2022
From colouring-in to pointillism: revisiting semantic segmentation supervision
Rodrigo Benenson
V. Ferrari
VLM
21
18
0
25 Oct 2022
Image-Text Retrieval with Binary and Continuous Label Supervision
Zheng Li
Caili Guo
Zerun Feng
Lei Li
Ying Jin
Yufeng Zhang
VLM
28
4
0
20 Oct 2022
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
54
35
0
19 Oct 2022
CPL: Counterfactual Prompt Learning for Vision and Language Models
Xuehai He
Diji Yang
Weixi Feng
Tsu-jui Fu
Arjun Reddy Akula
Varun Jampani
P. Narayana
Sugato Basu
William Yang Wang
Qing Guo
VPVLM
VLM
50
15
0
19 Oct 2022
CLIP-Driven Fine-grained Text-Image Person Re-identification
Shuanglin Yan
Neng Dong
Liyan Zhang
Jinhui Tang
45
87
0
19 Oct 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang
Zhenbang Wu
Dinesh Agarwal
Jimeng Sun
CLIP
VLM
MedIm
49
399
0
18 Oct 2022
Using Language to Extend to Unseen Domains
Lisa Dunlap
Clara Mohri
Devin Guillory
Han Zhang
Trevor Darrell
Joseph E. Gonzalez
Aditi Raghunanthan
Anja Rohrbach
VLM
20
35
0
18 Oct 2022
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Wenliang Dai
Zihan Liu
Ziwei Ji
Dan Su
Pascale Fung
MLLM
VLM
32
62
0
14 Oct 2022
Visual Classification via Description from Large Language Models
Sachit Menon
Carl Vondrick
VLM
35
287
0
13 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
20
68
0
12 Oct 2022
LiveSeg: Unsupervised Multimodal Temporal Segmentation of Long Livestream Videos
Jielin Qiu
Franck Dernoncourt
Trung Bui
Zhaowen Wang
Ding Zhao
Hailin Jin
AI4TS
19
5
0
12 Oct 2022
Underspecification in Scene Description-to-Depiction Tasks
Ben Hutchinson
Jason Baldridge
Vinodkumar Prabhakaran
DiffM
71
32
0
11 Oct 2022
Learning to Decompose Visual Features with Latent Textual Prompts
Feng Wang
Manling Li
Xudong Lin
Hairong Lv
A. Schwing
Heng Ji
VLM
19
23
0
09 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
21
8
0
09 Oct 2022
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Feng Liang
Bichen Wu
Xiaoliang Dai
Kunpeng Li
Yinan Zhao
Hang Zhang
Peizhao Zhang
Peter Vajda
Diana Marculescu
CLIP
VLM
37
433
0
09 Oct 2022
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
F. Khan
VPVLM
VLM
212
531
0
06 Oct 2022
Content-Based Search for Deep Generative Models
Daohan Lu
Sheng-Yu Wang
Nupur Kumari
Rohan Agarwal
Mia Tang
David Bau
Jun-Yan Zhu
DiffM
SyDa
38
5
0
06 Oct 2022
CLIP model is an Efficient Continual Learner
Vishal G. Thengane
Salman Khan
Munawar Hayat
F. Khan
BDL
VLM
CLL
112
46
0
06 Oct 2022
Generalization Properties of Retrieval-based Models
Soumya Basu
A. S. Rawat
Manzil Zaheer
29
6
0
06 Oct 2022
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Jon Almazán
ByungSoo Ko
Geonmo Gu
Diane Larlus
Yannis Kalantidis
ObjD
VLM
36
7
0
05 Oct 2022
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
Guangyi Chen
Weiran Yao
Xiangchen Song
Xinyue Li
Yongming Rao
Anton van den Hengel
VPVLM
VLM
8
62
0
03 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
32
25
0
03 Oct 2022
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Weicheng Kuo
Huayu Chen
Xiuye Gu
A. Piergiovanni
A. Angelova
MLLM
VLM
ObjD
49
134
0
30 Sep 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
VLM
27
19
0
30 Sep 2022
Mind Reader: Reconstructing complex images from brain activities
Sikun Lin
Thomas C. Sprague
Ambuj K. Singh
DiffM
124
86
0
30 Sep 2022
Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval
Zheng Li
Caili Guo
Xin Wang
Zerun Feng
Lei Li
Zhongtian Du
VLM
24
2
0
28 Sep 2022
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval
Xiaohan Zou
Changqiao Wu
Lele Cheng
Zhongyuan Wang
94
6
0
28 Sep 2022
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Janghyeon Lee
Jongsuk Kim
Hyounguk Shon
Bumsoo Kim
Seung Wook Kim
Honglak Lee
Junmo Kim
CLIP
VLM
50
53
0
27 Sep 2022
Paraphrasing Is All You Need for Novel Object Captioning
Cheng Yang
Yao-Hung Hubert Tsai
Wanshu Fan
Ruslan Salakhutdinov
Louis-Philippe Morency
Yu-Chiang Frank Wang
36
4
0
25 Sep 2022
GAMA: Generative Adversarial Multi-Object Scene Attacks
Abhishek Aich
Calvin-Khang Ta
Akash Gupta
Chengyu Song
S. Krishnamurthy
M. Salman Asif
A. Roy-Chowdhury
AAML
51
17
0
20 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIP
VLM
25
68
0
14 Sep 2022
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Felix Vogel
Nina Shvetsova
Leonid Karlinsky
Hilde Kuehne
VLM
63
7
0
12 Sep 2022
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language
Bing-Huang Su
Dazhao Du
Zhao-Qing Yang
Yujie Zhou
Jiangmeng Li
Anyi Rao
Haoran Sun
Zhiwu Lu
Ji-Rong Wen
46
107
0
12 Sep 2022
Pre-training image-language transformers for open-vocabulary tasks
A. Piergiovanni
Weicheng Kuo
A. Angelova
VLM
ViT
39
8
0
09 Sep 2022
FETA: Towards Specializing Foundation Models for Expert Task Applications
Amit Alfassy
Assaf Arbelle
Oshri Halimi
Sivan Harary
Roei Herzig
...
Christoph Auer
Kate Saenko
Peter W. J. Staar
Rogerio Feris
Leonid Karlinsky
23
19
0
08 Sep 2022
Multimodal contrastive learning for remote sensing tasks
Umang Jain
Alex Wilson
Varun Gulshan
SSL
36
24
0
06 Sep 2022
Design of the topology for contrastive visual-textual alignment
Zhun Sun
30
1
0
05 Sep 2022
Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from A Conditional Causal Perspective
Jiangmeng Li
Yanan Zhang
Wenwen Qiang
Hui Xiong
Chengbo Jiao
Xiaohui Hu
Changwen Zheng
Gang Hua
CML
34
28
0
26 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
54
158
0
25 Aug 2022
Contrastive Audio-Language Learning for Music
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
27
44
0
25 Aug 2022
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
44
3
0
24 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Yanbei Chen
Massimiliano Mancini
Xiatian Zhu
Zeynep Akata
45
113
0
24 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
51
629
0
22 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
Chanwoo Park
Sangdoo Yun
Sanghyuk Chun
AAML
21
32
0
21 Aug 2022
Semantic-Enhanced Image Clustering
Shao-Qian Cai
Li-qing Qiu
Xiaojun Chen
Qin Zhang
Long Chen
VLM
31
13
0
21 Aug 2022
Mere Contrastive Learning for Cross-Domain Sentiment Analysis
Yun Luo
Fang Guo
Zihan Liu
Yue Zhang
33
15
0
18 Aug 2022
See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval
Xiujun Shu
Wei Wen
Haoqian Wu
Keyun Chen
Yi-Zhe Song
Ruizhi Qiao
Bohan Ren
Xiao Wang
27
91
0
18 Aug 2022
Previous
1
2
3
...
11
12
13
14
15
16
Next