ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 10,236 papers shown
Title
TimbreCLIP: Connecting Timbre to Text and Images
TimbreCLIP: Connecting Timbre to Text and Images
Nicolas Jonason
Bob L. T. Sturm
CLIP
33
4
0
21 Nov 2022
Investigating Prompt Engineering in Diffusion Models
Investigating Prompt Engineering in Diffusion Models
Sam Witteveen
Martin Andrews
11
58
0
21 Nov 2022
Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Qinjie Zheng
Chaoyue Wang
Daqing Liu
Dadong Wang
Dacheng Tao
LRM
34
0
0
21 Nov 2022
Language in a Bottle: Language Model Guided Concept Bottlenecks for
  Interpretable Image Classification
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
Yue Yang
Artemis Panagopoulou
Shenghao Zhou
Daniel Jin
Chris Callison-Burch
Mark Yatskar
54
214
0
21 Nov 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating
  Unified Vision Language Model
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
33
32
0
21 Nov 2022
Unifying Tracking and Image-Video Object Detection
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
28
0
0
20 Nov 2022
MagicVideo: Efficient Video Generation With Latent Diffusion Models
MagicVideo: Efficient Video Generation With Latent Diffusion Models
Daquan Zhou
Weimin Wang
Hanshu Yan
Weiwei Lv
Yizhe Zhu
Jiashi Feng
DiffM
VGen
41
373
0
20 Nov 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach
  to Cross-Modal Sarcasm Generation
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
31
1
0
20 Nov 2022
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Yunhao Gou
Tom Ko
Hansi Yang
James T. Kwok
Yu Zhang
Mingxuan Wang
VLM
16
10
0
20 Nov 2022
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Xichen Pan
Pengda Qin
Yuhong Li
Hui Xue
Wenhu Chen
DiffM
29
63
0
20 Nov 2022
DiffStyler: Controllable Dual Diffusion for Text-Driven Image
  Stylization
DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization
Nisha Huang
Yuxin Zhang
Fan Tang
Chongyang Ma
Haibin Huang
Yong Zhang
Weiming Dong
Changsheng Xu
DiffM
28
41
0
19 Nov 2022
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional
  Zero-Shot Learning
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning
Xiaocheng Lu
Ziming Liu
Song Guo
Jingcai Guo
CoGe
33
30
0
19 Nov 2022
A Unified Model for Video Understanding and Knowledge Embedding with
  Heterogeneous Knowledge Graph Dataset
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset
Jiaxin Deng
Dong Shen
Haojie Pan
Xiangyu Wu
Ximan Liu
Gaofeng Meng
Fan Yang
Size Li
Ruiji Fu
Zhongyuan Wang
25
1
0
19 Nov 2022
Operationalizing Specifications, In Addition to Test Sets for Evaluating
  Constrained Generative Models
Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models
Vikas Raunak
Matt Post
Arul Menezes
EGVM
37
0
0
19 Nov 2022
Magic3D: High-Resolution Text-to-3D Content Creation
Magic3D: High-Resolution Text-to-3D Content Creation
Chen-Hsuan Lin
Jun Gao
Luming Tang
Towaki Takikawa
Fangyin Wei
Xun Huang
Karsten Kreis
Sanja Fidler
Ming Liu
Nayeon Lee
67
1,119
0
18 Nov 2022
Visual Programming: Compositional visual reasoning without training
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
94
406
0
18 Nov 2022
Weighted Ensemble Self-Supervised Learning
Weighted Ensemble Self-Supervised Learning
Yangjun Ruan
Saurabh Singh
Warren Morningstar
Alexander A. Alemi
Sergey Ioffe
Ian S. Fischer
Joshua V. Dillon
FedML
29
15
0
18 Nov 2022
Data-Centric Debugging: mitigating model failures via targeted data
  collection
Data-Centric Debugging: mitigating model failures via targeted data collection
Sahil Singla
Atoosa Malemir Chegini
Mazda Moayeri
Soheil Feiz
27
4
0
17 Nov 2022
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
  Vision-Language Tasks
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
26
55
0
17 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
42
41
0
17 Nov 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
94
1,711
0
17 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLM
CLIP
50
24
0
17 Nov 2022
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
James Smith
Paola Cascante-Bonilla
Assaf Arbelle
Donghyun Kim
Yikang Shen
David D. Cox
Diyi Yang
Z. Kira
Rogerio Feris
Leonid Karlinsky
VLM
47
20
0
17 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only
  Language Supervision
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
27
24
0
17 Nov 2022
Cross-Modal Adapter for Text-Video Retrieval
Cross-Modal Adapter for Text-Video Retrieval
Haojun Jiang
Jianke Zhang
Rui Huang
Chunjiang Ge
Zanlin Ni
Jiwen Lu
Jie Zhou
S. Song
Gao Huang
53
36
0
17 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video
  UniFormer
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
30
107
0
17 Nov 2022
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
Vaclav Kosar
A. Hoskovec
Milan Šulc
Radek Bartyzal
VLM
32
3
0
17 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
24
27
0
17 Nov 2022
How to Fine-Tune Vision Models with SGD
How to Fine-Tune Vision Models with SGD
Ananya Kumar
Ruoqi Shen
Sébastien Bubeck
Suriya Gunasekar
VLM
14
29
0
17 Nov 2022
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive
  Survey
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
26
14
0
17 Nov 2022
Language-Assisted Deep Learning for Autistic Behaviors Recognition
Language-Assisted Deep Learning for Autistic Behaviors Recognition
Andong Deng
Taojiannan Yang
Chong Chen
Qian Chen
Leslie C. Neely
Sakiko Oyama
26
8
0
17 Nov 2022
Few-shot Learning for Multi-modal Social Media Event Filtering
Few-shot Learning for Multi-modal Social Media Event Filtering
José Nascimento
J. P. Cardenuto
J. Yang
Anderson de Rezende Rocha
22
3
0
16 Nov 2022
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Marc Fischer
Alexander Bartler
Bin Yang
SSeg
24
18
0
16 Nov 2022
AlignVE: Visual Entailment Recognition Based on Alignment Relations
AlignVE: Visual Entailment Recognition Based on Alignment Relations
Biwei Cao
Jiuxin Cao
Jie Gui
Jiayun Shen
Bo Liu
Lei He
Yuan Yan Tang
James T. Kwok
26
7
0
16 Nov 2022
A Simple Transformer-Based Model for Ego4D Natural Language Queries
  Challenge
A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge
Sicheng Mo
Fangzhou Mu
Yin Li
24
7
0
16 Nov 2022
Person Text-Image Matching via Text-Feature Interpretability Embedding
  and External Attack Node Implantation
Person Text-Image Matching via Text-Feature Interpretability Embedding and External Attack Node Implantation
Fan Li
Hang Zhou
Huafeng Li
Yafei Zhang
Z. Yu
DiffM
37
5
0
16 Nov 2022
Navigating Connected Memories with a Task-oriented Dialog System
Navigating Connected Memories with a Task-oriented Dialog System
Seungwhan Moon
Satwik Kottur
A. Geramifard
Babak Damavandi
35
2
0
15 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
53
101
0
15 Nov 2022
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
  Research
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
J. Bornschein
Alexandre Galashov
Ross Hemsley
Amal Rannen-Triki
Yutian Chen
...
Angeliki Lazaridou
Yee Whye Teh
Andrei A. Rusu
Razvan Pascanu
MarcÁurelio Ranzato
OOD
VLM
AI4TS
44
17
0
15 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling
  Approaches
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
27
6
0
15 Nov 2022
Versatile Diffusion: Text, Images and Variations All in One Diffusion
  Model
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
Xingqian Xu
Zhangyang Wang
Eric Zhang
Kai Wang
Humphrey Shi
DiffM
43
186
0
15 Nov 2022
PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive
  leaRning
PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive leaRning
Jelle Luijkx
Zlatan Ajanović
L. Ferranti
Jens Kober
23
3
0
15 Nov 2022
Homomorphic Self-Supervised Learning
Homomorphic Self-Supervised Learning
Thomas Anderson Keller
Xavier Suau
Luca Zappella
SSL
21
2
0
15 Nov 2022
CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive
  Learning
CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang
Hongbin Liu
Jinyuan Jia
Neil Zhenqiang Gong
AAML
35
20
0
15 Nov 2022
Self-supervised remote sensing feature learning: Learning Paradigms,
  Challenges, and Future Works
Self-supervised remote sensing feature learning: Learning Paradigms, Challenges, and Future Works
Chao Tao
Ji Qi
Mingning Guo
Qing Zhu
Haifeng Li
SSL
31
56
0
15 Nov 2022
Multilingual and Multimodal Topic Modelling with Pretrained Embeddings
Multilingual and Multimodal Topic Modelling with Pretrained Embeddings
Elaine Zosa
Lidia Pivovarova
BDL
18
8
0
15 Nov 2022
FedTune: A Deep Dive into Efficient Federated Fine-Tuning with
  Pre-trained Transformers
FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers
Jinyu Chen
Wenchao Xu
Song Guo
Junxiao Wang
Jie Zhang
Yining Qi
FedML
33
32
0
15 Nov 2022
YORO -- Lightweight End to End Visual Grounding
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
21
21
0
15 Nov 2022
Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning
Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning
Shangchao Su
Min Yang
Bin Li
Xiangyang Xue
VLM
FedML
38
18
0
15 Nov 2022
Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image
  Generation
Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation
Zhihong Pan
Xiaoxia Zhou
Hao Tian
DiffM
20
11
0
14 Nov 2022
Previous
123...182183184...203204205
Next