ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 9,770 papers shown
Title
Extending CLIP for Category-to-image Retrieval in E-commerce
Extending CLIP for Category-to-image Retrieval in E-commerce
Mariya Hendriksen
Maurits J. R. Bleeker
Svitlana Vakulenko
Nanne van Noord
E. Kuiper
Maarten de Rijke
VLM
11
30
0
21 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
150
14,641
0
20 Dec 2021
GLIDE: Towards Photorealistic Image Generation and Editing with
  Text-Guided Diffusion Models
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
84
3,475
0
20 Dec 2021
Mind-proofing Your Phone: Navigating the Digital Minefield with
  GreaseTerminator
Mind-proofing Your Phone: Navigating the Digital Minefield with GreaseTerminator
Siddhartha Datta
Konrad Kollnig
N. Shadbolt
27
10
0
20 Dec 2021
Learning with Label Noise for Image Retrieval by Selecting Interactions
Learning with Label Noise for Image Retrieval by Selecting Interactions
Sarah Ibrahimi
Arnaud Sors
Rafael Sampaio de Rezende
S. Clinchant
NoLa
VLM
24
16
0
20 Dec 2021
Soundify: Matching Sound Effects to Video
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
30
16
0
17 Dec 2021
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
S. Hoi
28
191
0
17 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
19
33
0
17 Dec 2021
Ensembling Off-the-shelf Models for GAN Training
Ensembling Off-the-shelf Models for GAN Training
Nupur Kumari
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
34
86
0
16 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
40
555
0
16 Dec 2021
CODER: An efficient framework for improving retrieval through COntextual
  Document Embedding Reranking
CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking
George Zerveas
Navid Rekabsaz
Daniel Cohen
Carsten Eickhoff
31
8
0
16 Dec 2021
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource
  Historical Document Transcription
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription
Nikolai Vogler
J. Allen
M. Miller
Taylor Berg-Kirkpatrick
29
5
0
16 Dec 2021
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
Shiming Chen
Zi-Quan Hong
Wenjin Hou
Guosen Xie
Yibing Song
Jian-jun Zhao
Xinge You
Shuicheng Yan
Ling Shao
ViT
17
44
0
16 Dec 2021
CLIP-Lite: Information Efficient Visual Representation Learning with
  Language Supervision
CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision
A. Shrivastava
Ramprasaath R. Selvaraju
Nikhil Naik
Vicente Ordonez
VLM
CLIP
30
6
0
14 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for
  Vision-and-Language Tasks
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
VPVLM
35
343
0
13 Dec 2021
SAC-GAN: Structure-Aware Image Composition
SAC-GAN: Structure-Aware Image Composition
Hang Zhou
Rui Ma
Ling-Xiao Zhang
Lina Gao
Ali Mahdavi-Amiri
Haotong Zhang
GAN
35
7
0
13 Dec 2021
Shaping Visual Representations with Attributes for Few-Shot Recognition
Shaping Visual Representations with Attributes for Few-Shot Recognition
Haoxing Chen
Huaxiong Li
Yaohui Li
Chunlin Chen
34
7
0
13 Dec 2021
PartGlot: Learning Shape Part Segmentation from Language Reference Games
PartGlot: Learning Shape Part Segmentation from Language Reference Games
Juil Koo
Ian Huang
Panos Achlioptas
Leonidas J. Guibas
Minhyuk Sung
3DPC
30
28
0
13 Dec 2021
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
40
2
0
11 Dec 2021
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
Xihui Liu
Dong Huk Park
S. Azadi
Gong Zhang
Arman Chopikyan
Yuxiao Hu
Humphrey Shi
Anna Rohrbach
Trevor Darrell
DiffM
36
251
0
10 Dec 2021
Unified Multimodal Pre-training and Prompt-based Tuning for
  Vision-Language Understanding and Generation
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu-Gang Jiang
VLM
MLLM
32
10
0
10 Dec 2021
Multimodal Interactions Using Pretrained Unimodal Models for SIMMC 2.0
Multimodal Interactions Using Pretrained Unimodal Models for SIMMC 2.0
Joosung Lee
Kijong Han
37
6
0
10 Dec 2021
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
Rameen Abdal
Peihao Zhu
John C. Femiani
Niloy J. Mitra
Peter Wonka
CLIP
39
103
0
09 Dec 2021
HairCLIP: Design Your Hair by Text and Reference Image
HairCLIP: Design Your Hair by Text and Reference Image
Tianyi Wei
Dongdong Chen
Wenbo Zhou
Jing Liao
Zhentao Tan
Lu Yuan
Weiming Zhang
Nenghai Yu
CLIP
33
108
0
09 Dec 2021
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
Can Wang
Menglei Chai
Mingming He
Dongdong Chen
Jing Liao
CLIP
29
377
0
09 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
40
687
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
34
129
0
08 Dec 2021
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Jenq-Neng Hwang
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
31
1,018
0
07 Dec 2021
A Generic Approach for Enhancing GANs by Regularized Latent Optimization
A Generic Approach for Enhancing GANs by Regularized Latent Optimization
Yufan Zhou
Chunyuan Li
Changyou Chen
Jinhui Xu
27
0
0
07 Dec 2021
Text2Mesh: Text-Driven Neural Stylization for Meshes
Text2Mesh: Text-Driven Neural Stylization for Meshes
O. Michel
Roi Bar-On
Richard Liu
Sagie Benaim
Rana Hanocka
CLIP
AI4CE
199
351
0
06 Dec 2021
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation
  Examples
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples
Nir Zabari
Yedid Hoshen
VLM
33
26
0
06 Dec 2021
Embedding Arithmetic of Multimodal Queries for Image Retrieval
Embedding Arithmetic of Multimodal Queries for Image Retrieval
Guillaume Couairon
Matthieu Cord
Matthijs Douze
Holger Schwenk
35
23
0
06 Dec 2021
Joint Learning of Localized Representations from Medical Images and
  Reports
Joint Learning of Localized Representations from Medical Images and Reports
Philipp Muller
Georgios Kaissis
Cong Zou
Daniel Munich
137
81
0
06 Dec 2021
Forward Compatible Training for Large-Scale Embedding Retrieval Systems
Forward Compatible Training for Large-Scale Embedding Retrieval Systems
Vivek Ramanujan
Pavan Kumar Anasosalu Vasu
Ali Farhadi
Oncel Tuzel
Hadi Pouransari
VLM
26
16
0
06 Dec 2021
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
  Sentiment Classification
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification
Zhenhailong Wang
Heng Ji
84
71
0
05 Dec 2021
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Longtian Qiu
Renrui Zhang
Ziyu Guo
Wei Zhang
Zilu Guo
Ziyao Zeng
Guangnan Zhang
VLM
CLIP
28
45
0
04 Dec 2021
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip H. S. Torr
148
306
0
04 Dec 2021
SemanticStyleGAN: Learning Compositional Generative Priors for
  Controllable Image Synthesis and Editing
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Yichun Shi
Xiao Yang
Yangyue Wan
Xiaohui Shen
GAN
145
83
0
04 Dec 2021
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN
  Space Optimization
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Xingchao Liu
Chengyue Gong
Lemeng Wu
Shujian Zhang
Haoran Su
Qiang Liu
CLIP
35
89
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
  for Zero-shot and Few-shot Tasks
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
53
129
0
02 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLM
CLIP
94
551
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
30
23
0
02 Dec 2021
Extract Free Dense Labels from CLIP
Extract Free Dense Labels from CLIP
Chong Zhou
Chen Change Loy
Bo Dai
VLM
CLIP
45
455
0
02 Dec 2021
Editing a classifier by rewriting its prediction rules
Editing a classifier by rewriting its prediction rules
Shibani Santurkar
Dimitris Tsipras
Mahalaxmi Elango
David Bau
Antonio Torralba
A. Madry
KELM
180
89
0
02 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from
  a Single Image
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
43
7
0
01 Dec 2021
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from
  Sparse Inputs
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Michael Niemeyer
Jonathan T. Barron
B. Mildenhall
Mehdi S. M. Sajjadi
Andreas Geiger
Noha Radwan
51
579
0
01 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViT
VLM
17
79
0
01 Dec 2021
MAD: A Scalable Dataset for Language Grounding in Videos from Movie
  Audio Descriptions
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Mattia Soldan
Alejandro Pardo
Juan Carlos León Alcázar
Fabian Caba Heilbron
Chen Zhao
Silvio Giancola
Guohao Li
VGen
44
95
0
01 Dec 2021
CLIPstyler: Image Style Transfer with a Single Text Condition
CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon
Jong Chul Ye
VLM
CLIP
27
240
0
01 Dec 2021
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context
  Images via Online Resources
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
Sahar Abdelnabi
Rakibul Hasan
Mario Fritz
26
74
0
30 Nov 2021
Previous
123...190191192...194195196
Next