ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 10,380 papers shown
Title
Transformer-based Image Generation from Scene Graphs
Transformer-based Image Generation from Scene Graphs
Renato Sortino
S. Palazzo
C. Spampinato
ViT
61
15
0
08 Mar 2023
Exploring Efficient-Tuned Learning Audio Representation Method from
  BriVL
Exploring Efficient-Tuned Learning Audio Representation Method from BriVL
Sen Fang
Yang Wu
Bowen Gao
Jingwen Cai
T. Teoh
DiffM
29
1
0
08 Mar 2023
CUDA: Convolution-based Unlearnable Datasets
CUDA: Convolution-based Unlearnable Datasets
Vinu Sankar Sadasivan
Mahdi Soltanolkotabi
S. Feizi
MU
31
25
0
07 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
40
509
0
07 Mar 2023
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation
  Using Scene Object Spectrum Grounding
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
Minyoung Hwang
Jaeyeon Jeong
Minsoo Kim
Yoonseon Oh
Songhwai Oh
43
19
0
07 Mar 2023
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building
  [Technical Report]
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]
Maureen Daum
Enhao Zhang
Dong He
Stephen Mussmann
Brandon Haynes
Ranjay Krishna
Magdalena Balazinska
32
4
0
07 Mar 2023
ELODIN: Naming Concepts in Embedding Spaces
ELODIN: Naming Concepts in Embedding Spaces
Rodrigo Mello
Filipe Calegario
Geber Ramalho
DiffM
35
1
0
07 Mar 2023
Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding
Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding
Jiacheng Li
Longhui Wei
Zongyuan Zhan
Xinfu He
Siliang Tang
Qi Tian
Yueting Zhuang
31
4
0
07 Mar 2023
Learning Discriminative Representations for Skeleton Based Action
  Recognition
Learning Discriminative Representations for Skeleton Based Action Recognition
Huanyu Zhou
Qingjie Liu
Yunhong Wang
34
72
0
07 Mar 2023
Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration
  for Zero-Shot Object Navigation
Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration for Zero-Shot Object Navigation
Vishnu Sashank Dorbala
James F. Mullen
Tianyi Zhou
LM&Ro
40
90
0
06 Mar 2023
To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in
  Transfer Learning
To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning
Ildus Sadrtdinov
Dmitrii Pozdeev
Dmitry Vetrov
E. Lobacheva
40
4
0
06 Mar 2023
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
  Learning
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
Hritik Bansal
Nishad Singhi
Yu Yang
Fan Yin
Aditya Grover
Kai-Wei Chang
AAML
39
42
0
06 Mar 2023
Neural Style Transfer for Vector Graphics
Neural Style Transfer for Vector Graphics
Valeria Efimova
Artyom Chebykin
Ivan Jarsky
Evgenii Prosvirnin
Andrey Filchenkov
32
5
0
06 Mar 2023
StyO: Stylize Your Face in Only One-Shot
StyO: Stylize Your Face in Only One-Shot
Bonan li
Zicheng Zhang
Xuecheng Nie
Congying Han
Yinhan Hu
Tiande Guo
DiffM
41
6
0
06 Mar 2023
CLIP-guided Prototype Modulating for Few-shot Action Recognition
CLIP-guided Prototype Modulating for Few-shot Action Recognition
Xiang Wang
Shiwei Zhang
Jun Cen
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
VLM
32
53
0
06 Mar 2023
Non-Parametric Outlier Synthesis
Non-Parametric Outlier Synthesis
Leitian Tao
Xuefeng Du
Xiaojin Zhu
Yixuan Li
OODD
33
98
0
06 Mar 2023
Streaming Active Learning with Deep Neural Networks
Streaming Active Learning with Deep Neural Networks
Akanksha Saran
Safoora Yousefi
A. Krishnamurthy
John Langford
Jordan T. Ash
45
15
0
05 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
49
21
0
04 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
20
31
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
  Tasks
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
29
38
0
04 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based
  Polishing
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDL
DiffM
29
33
0
04 Mar 2023
PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling
PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling
Yuan Liu
Songyang Zhang
Jiacheng Chen
Kai-xiang Chen
Dahua Lin
75
28
0
04 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
37
4
0
04 Mar 2023
Open-Vocabulary Affordance Detection in 3D Point Clouds
Open-Vocabulary Affordance Detection in 3D Point Clouds
Toan Ngyen
Minh Nhat Vu
Annalies Vuong
Dzung Nguyen
T. Vo
Ngan Le
A. Nguyen
3DPC
29
32
0
04 Mar 2023
Fine-Grained ImageNet Classification in the Wild
Fine-Grained ImageNet Classification in the Wild
Maria Lymperaiou
Konstantinos Thomas
Giorgos Stamou
VLM
36
1
0
04 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
44
13
0
04 Mar 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao
Yongming Rao
Zuyan Liu
Benlin Liu
Jie Zhou
Jiwen Lu
ObjD
VLM
MDE
163
218
0
03 Mar 2023
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong
  Few-shot Learners
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Renrui Zhang
Xiangfei Hu
Bohao Li
Siyuan Huang
Hanqiu Deng
Hongsheng Li
Yu Qiao
Peng Gao
VLM
MLLM
45
170
0
03 Mar 2023
Zero-shot Object Counting
Zero-shot Object Counting
Jingyi Xu
Hieu M. Le
Vu Nguyen
Viresh Ranjan
Dimitris Samaras
39
43
0
03 Mar 2023
Visual Exemplar Driven Task-Prompting for Unified Perception in
  Autonomous Driving
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
Xiwen Liang
Minzhe Niu
Jianhua Han
Hang Xu
Chunjing Xu
Xiaodan Liang
VLM
36
14
0
03 Mar 2023
Meme Sentiment Analysis Enhanced with Multimodal Spatial Encoding and
  Facial Embedding
Meme Sentiment Analysis Enhanced with Multimodal Spatial Encoding and Facial Embedding
Muzhaffar Hazman
Susan Mckeever
Josephine Griffith
25
4
0
03 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
Alexa Arena: A User-Centric Interactive Platform for Embodied AI
Alexa Arena: A User-Centric Interactive Platform for Embodied AI
Qiaozi Gao
Govind Thattai
Suhaila Shakiah
Xiaofeng Gao
Shreyas Pansare
...
Michael Johnston
R. Ghanadan
Arindam Mandal
Dilek Z. Hakkani-Tür
Premkumar Natarajan
6
27
0
02 Mar 2023
Computational Language Acquisition with Theory of Mind
Computational Language Acquisition with Theory of Mind
Andy Liu
Hao Zhu
Emmy Liu
Yonatan Bisk
Graham Neubig
LLMAG
AI4CE
32
17
0
02 Mar 2023
Dropout Reduces Underfitting
Dropout Reduces Underfitting
Zhuang Liu
Zhi-Qin John Xu
Joseph Jin
Zhiqiang Shen
Trevor Darrell
52
36
0
02 Mar 2023
3D generation on ImageNet
3D generation on ImageNet
Ivan Skorokhodov
Aliaksandr Siarohin
Yinghao Xu
Jian Ren
Hsin-Ying Lee
Peter Wonka
Sergey Tulyakov
69
55
0
02 Mar 2023
MLANet: Multi-Level Attention Network with Sub-instruction for
  Continuous Vision-and-Language Navigation
MLANet: Multi-Level Attention Network with Sub-instruction for Continuous Vision-and-Language Navigation
Zongtao He
Liuyi Wang
Shu Li
Qingqing Yan
Chengju Liu
Qi Chen
32
7
0
02 Mar 2023
Token Contrast for Weakly-Supervised Semantic Segmentation
Token Contrast for Weakly-Supervised Semantic Segmentation
Lixiang Ru
Heliang Zheng
Yibing Zhan
Bo Du
ViT
42
86
0
02 Mar 2023
X&Fuse: Fusing Visual Information in Text-to-Image Generation
X&Fuse: Fusing Visual Information in Text-to-Image Generation
Yuval Kirstain
Omer Levy
Adam Polyak
DiffM
27
5
0
02 Mar 2023
Image Labels Are All You Need for Coarse Seagrass Segmentation
Image Labels Are All You Need for Coarse Seagrass Segmentation
Scarlett Raine
Ross Marchant
Branislav Kusy
Frederic Maire
Tobias Fischer
40
5
0
02 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for
  Embodied Agents
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
37
43
0
01 Mar 2023
Cross-Modal Entity Matching for Visually Rich Documents
Cross-Modal Entity Matching for Visually Rich Documents
Ritesh Sarkhel
Arnab Nandi
35
3
0
01 Mar 2023
Succinct Representations for Concepts
Succinct Representations for Concepts
Yang Yuan
15
1
0
01 Mar 2023
Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment
Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment
Dandan Shan
Zihan Li
Wentao Chen
Qingde Li
Jie Tian
Qingqi Hong
26
8
0
01 Mar 2023
Collage Diffusion
Collage Diffusion
Vishnu Sarukkai
Linden Li
Arden Ma
Christopher Ré
Kayvon Fatahalian
DiffM
42
24
0
01 Mar 2023
Single Image Backdoor Inversion via Robust Smoothed Classifiers
Single Image Backdoor Inversion via Robust Smoothed Classifiers
Mingjie Sun
Zico Kolter
AAML
25
12
0
01 Mar 2023
Convolutional Visual Prompt for Robust Visual Perception
Convolutional Visual Prompt for Robust Visual Perception
Yun-Yun Tsai
Chengzhi Mao
Junfeng Yang
VLM
VPVLM
44
13
0
01 Mar 2023
CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial
  Expression Recognition
CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition
Hanting Li
Hongjing Niu
Zhaoqing Zhu
Feng Zhao
VLM
CLIP
28
26
0
01 Mar 2023
Applying Plain Transformers to Real-World Point Clouds
Applying Plain Transformers to Real-World Point Clouds
Lanxiao Li
M. Heizmann
3DPC
ViT
31
3
0
28 Feb 2023
The Elements of Visual Art Recommendation: Learning Latent Semantic
  Representations of Paintings
The Elements of Visual Art Recommendation: Learning Latent Semantic Representations of Paintings
B. Yilma
Luis A. Leiva
DiffM
27
15
0
28 Feb 2023
Previous
123...174175176...206207208
Next