ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.05918
  4. Cited By
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

11 February 2021
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
    VLM
    CLIP
ArXivPDFHTML

Papers citing "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision"

50 / 841 papers shown
Title
Learning to Decompose Visual Features with Latent Textual Prompts
Learning to Decompose Visual Features with Latent Textual Prompts
Feng Wang
Manling Li
Xudong Lin
Hairong Lv
A. Schwing
Heng Ji
VLM
19
23
0
09 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language
  Representation Learning
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
21
8
0
09 Oct 2022
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Feng Liang
Bichen Wu
Xiaoliang Dai
Kunpeng Li
Yinan Zhao
Hang Zhang
Peizhao Zhang
Peter Vajda
Diana Marculescu
CLIP
VLM
37
434
0
09 Oct 2022
MaPLe: Multi-modal Prompt Learning
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
VPVLM
VLM
212
532
0
06 Oct 2022
Content-Based Search for Deep Generative Models
Content-Based Search for Deep Generative Models
Daohan Lu
Sheng-Yu Wang
Nupur Kumari
Rohan Agarwal
Mia Tang
David Bau
Jun-Yan Zhu
DiffM
SyDa
38
5
0
06 Oct 2022
CLIP model is an Efficient Continual Learner
CLIP model is an Efficient Continual Learner
Vishal G. Thengane
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
BDL
VLM
CLL
112
46
0
06 Oct 2022
Generalization Properties of Retrieval-based Models
Generalization Properties of Retrieval-based Models
Soumya Basu
A. S. Rawat
Manzil Zaheer
31
6
0
06 Oct 2022
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Jon Almazán
ByungSoo Ko
Geonmo Gu
Diane Larlus
Yannis Kalantidis
ObjD
VLM
39
7
0
05 Oct 2022
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
Guangyi Chen
Weiran Yao
Xiangchen Song
Xinyue Li
Yongming Rao
Kun Zhang
VPVLM
VLM
8
62
0
03 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without
  Fine-tuning
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
35
25
0
03 Oct 2022
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language
  Models
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Weicheng Kuo
Huayu Chen
Xiuye Gu
A. Piergiovanni
A. Angelova
MLLM
VLM
ObjD
51
134
0
30 Sep 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text
  Pre-training
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
VLM
27
19
0
30 Sep 2022
Mind Reader: Reconstructing complex images from brain activities
Mind Reader: Reconstructing complex images from brain activities
Sikun Lin
Thomas C. Sprague
Ambuj K. Singh
DiffM
124
86
0
30 Sep 2022
Unified Loss of Pair Similarity Optimization for Vision-Language
  Retrieval
Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval
Zheng Li
Caili Guo
Xin Wang
Zerun Feng
Lei Li
Zhongtian Du
VLM
24
2
0
28 Sep 2022
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in
  Vision-Language Retrieval
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval
Xiaohan Zou
Changqiao Wu
Lele Cheng
Zhongyuan Wang
94
6
0
28 Sep 2022
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Janghyeon Lee
Jongsuk Kim
Hyounguk Shon
Bumsoo Kim
Seung Wook Kim
Honglak Lee
Junmo Kim
CLIP
VLM
52
53
0
27 Sep 2022
Paraphrasing Is All You Need for Novel Object Captioning
Paraphrasing Is All You Need for Novel Object Captioning
Cheng Yang
Yao-Hung Hubert Tsai
Wanshu Fan
Ruslan Salakhutdinov
Louis-Philippe Morency
Yu-Chiang Frank Wang
38
4
0
25 Sep 2022
GAMA: Generative Adversarial Multi-Object Scene Attacks
GAMA: Generative Adversarial Multi-Object Scene Attacks
Abhishek Aich
Calvin-Khang Ta
Akash Gupta
Chengyu Song
S. Krishnamurthy
M. Salman Asif
A. Roy-Chowdhury
AAML
51
17
0
20 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language
  Representation Alignment
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIP
VLM
25
68
0
14 Sep 2022
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of
  Vision-Language Models
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Felix Vogel
Nina Shvetsova
Leonid Karlinsky
Hilde Kuehne
VLM
63
7
0
12 Sep 2022
A Molecular Multimodal Foundation Model Associating Molecule Graphs with
  Natural Language
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language
Bing-Huang Su
Dazhao Du
Zhao-Qing Yang
Yujie Zhou
Jiangmeng Li
Anyi Rao
Haoran Sun
Zhiwu Lu
Ji-Rong Wen
46
108
0
12 Sep 2022
Pre-training image-language transformers for open-vocabulary tasks
Pre-training image-language transformers for open-vocabulary tasks
A. Piergiovanni
Weicheng Kuo
A. Angelova
VLM
ViT
39
8
0
09 Sep 2022
FETA: Towards Specializing Foundation Models for Expert Task
  Applications
FETA: Towards Specializing Foundation Models for Expert Task Applications
Amit Alfassy
Assaf Arbelle
Oshri Halimi
Sivan Harary
Roei Herzig
...
Christoph Auer
Kate Saenko
Peter W. J. Staar
Rogerio Feris
Leonid Karlinsky
23
19
0
08 Sep 2022
Multimodal contrastive learning for remote sensing tasks
Multimodal contrastive learning for remote sensing tasks
Umang Jain
Alex Wilson
Varun Gulshan
SSL
36
24
0
06 Sep 2022
Design of the topology for contrastive visual-textual alignment
Design of the topology for contrastive visual-textual alignment
Zhun Sun
30
1
0
05 Sep 2022
Disentangle and Remerge: Interventional Knowledge Distillation for
  Few-Shot Object Detection from A Conditional Causal Perspective
Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from A Conditional Causal Perspective
Jiangmeng Li
Yanan Zhang
Jingyao Wang
Hui Xiong
Chengbo Jiao
Xiaohui Hu
Changwen Zheng
Gang Hua
CML
34
28
0
26 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
  Pretraining
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
54
158
0
25 Aug 2022
Contrastive Audio-Language Learning for Music
Contrastive Audio-Language Learning for Music
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
27
44
0
25 Aug 2022
MuMUR : Multilingual Multimodal Universal Retrieval
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
44
3
0
24 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Yanbei Chen
Massimiliano Mancini
Xiatian Zhu
Zeynep Akata
45
113
0
24 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and
  Vision-Language Tasks
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
54
629
0
22 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function
  Perspective
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
Chanwoo Park
Sangdoo Yun
Sanghyuk Chun
AAML
21
32
0
21 Aug 2022
Semantic-Enhanced Image Clustering
Semantic-Enhanced Image Clustering
Shao-Qian Cai
Li-qing Qiu
Xiaojun Chen
Qin Zhang
Long Chen
VLM
33
13
0
21 Aug 2022
Mere Contrastive Learning for Cross-Domain Sentiment Analysis
Mere Contrastive Learning for Cross-Domain Sentiment Analysis
Yun Luo
Fang Guo
Zihan Liu
Yue Zhang
39
15
0
18 Aug 2022
See Finer, See More: Implicit Modality Alignment for Text-based Person
  Retrieval
See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval
Xiujun Shu
Wei Wen
Haoqian Wu
Keyun Chen
Yi-Zhe Song
Ruizhi Qiao
Bohan Ren
Xiao Wang
27
91
0
18 Aug 2022
ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal
  Fashion Design
ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design
Xujie Zhang
Yuyang Sha
Michael C. Kampffmeyer
Zhenyu Xie
Zequn Jie
Chengwen Huang
Jianqing Peng
Xiaodan Liang
14
18
0
11 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and
  Robustness of CLIP
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
47
98
0
10 Aug 2022
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language
  Pre-training
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training
Jaeseok Byun
Taebaek Hwang
Jianlong Fu
Taesup Moon
VLM
23
11
0
08 Aug 2022
A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch
A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch
Patsorn Sangkloy
Wittawat Jitkrittum
Diyi Yang
James Hays
3DV
26
32
0
05 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
40
313
0
04 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation
  Learning
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
36
67
0
03 Aug 2022
Augmenting Vision Language Pretraining by Learning Codebook with Visual
  Semantics
Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics
Xiaoyuan Guo
Jiali Duan
C.-C. Jay Kuo
J. Gichoya
Imon Banerjee
VLM
22
1
0
31 Jul 2022
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems
Taehyeong Kim
H. Song
Byoung-Tak Zhang
26
4
0
31 Jul 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision
  and Beyond
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
57
71
0
30 Jul 2022
ALADIN: Distilling Fine-grained Alignment Scores for Efficient
  Image-Text Matching and Retrieval
ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Nicola Messina
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
Fabrizio Falchi
Giuseppe Amato
Rita Cucchiara
VLM
16
21
0
29 Jul 2022
Curriculum Learning for Data-Efficient Vision-Language Alignment
Curriculum Learning for Data-Efficient Vision-Language Alignment
Tejas Srinivasan
Xiang Ren
Jesse Thomason
VLM
31
7
0
29 Jul 2022
Visual Recognition by Request
Visual Recognition by Request
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
16
15
0
28 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
24
47
0
26 Jul 2022
A Priority Map for Vision-and-Language Navigation with Trajectory Plans
  and Feature-Location Cues
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues
Jason Armitage
L. Impett
Rico Sennrich
24
5
0
24 Jul 2022
Semantic Abstraction: Open-World 3D Scene Understanding from 2D
  Vision-Language Models
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models
Huy Ha
Shuran Song
LM&Ro
VLM
43
102
0
23 Jul 2022
Previous
123...121314151617
Next