ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 10,528 papers shown
Title
Visual Question Answering: A Survey on Techniques and Common Trends in
  Recent Literature
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
35
23
0
18 May 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal
  Conversational Abilities
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLM
MLLM
67
307
0
18 May 2023
TextDiffuser: Diffusion Models as Text Painters
TextDiffuser: Diffusion Models as Text Painters
Jingye Chen
Yupan Huang
Tengchao Lv
Lei Cui
Qifeng Chen
Furu Wei
66
116
0
18 May 2023
LDM3D: Latent Diffusion Model for 3D
LDM3D: Latent Diffusion Model for 3D
Gabriela Ben-Melech Stan
Diana Wofk
Scottie Fox
Alex Redden
Will Saxton
...
Estelle Aflalo
Shao-Yen Tseng
Fabio Nonato
Matthias Muller
Vasudev Lal
35
45
0
18 May 2023
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical
  Images and Texts
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts
Qiuhui Chen
Xinyue Hu
Zirui Wang
Yi Hong
LM&MA
MedIm
30
35
0
18 May 2023
OpenShape: Scaling Up 3D Shape Representation Towards Open-World
  Understanding
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding
Minghua Liu
Ruoxi Shi
Kaiming Kuang
Yinhao Zhu
Xuanlin Li
Shizhong Han
H. Cai
Fatih Porikli
Hao Su
3DPC
53
118
0
18 May 2023
Segment Any Anomaly without Training via Hybrid Prompt Regularization
Segment Any Anomaly without Training via Hybrid Prompt Regularization
Yunkang Cao
Xiaohao Xu
Chen Sun
Y. Cheng
Zongwei Du
Liang Gao
Nong Sang
VLM
50
72
0
18 May 2023
Language Models Meet World Models: Embodied Experiences Enhance Language
  Models
Language Models Meet World Models: Embodied Experiences Enhance Language Models
Jiannan Xiang
Tianhua Tao
Yi Gu
Tianmin Shu
Zirui Wang
Zichao Yang
Zhiting Hu
ALM
LLMAG
LM&Ro
CLL
52
94
0
18 May 2023
CLIP-GCD: Simple Language Guided Generalized Category Discovery
CLIP-GCD: Simple Language Guided Generalized Category Discovery
Rabah Ouldnoughi
Chia-Wen Kuo
Z. Kira
VLM
42
14
0
17 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
62
75
0
17 May 2023
Selective Amnesia: A Continual Learning Approach to Forgetting in Deep
  Generative Models
Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
Alvin Heng
Harold Soh
VLM
KELM
DiffM
45
109
0
17 May 2023
Dual Semantic Knowledge Composed Multimodal Dialog Systems
Dual Semantic Knowledge Composed Multimodal Dialog Systems
Xiaolin Chen
Xuemeng Song
Yin-wei Wei
Liqiang Nie
Tat-Seng Chua
18
8
0
17 May 2023
A Method for Training-free Person Image Picture Generation
A Method for Training-free Person Image Picture Generation
Tianyu Chen
DiffM
26
0
0
16 May 2023
Prompt-Tuning Decision Transformer with Preference Ranking
Prompt-Tuning Decision Transformer with Preference Ranking
Shengchao Hu
Li Shen
Ya Zhang
Dacheng Tao
OffRL
43
14
0
16 May 2023
SoundStorm: Efficient Parallel Audio Generation
SoundStorm: Efficient Parallel Audio Generation
Zalan Borsos
Matthew Sharifi
Damien Vincent
Eugene Kharitonov
Neil Zeghidour
Marco Tagliasacchi
28
98
0
16 May 2023
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Zhangxuan Gu
Zhuoer Xu
Haoxing Chen
Jun Lan
Changhua Meng
Weiqiang Wang
25
4
0
16 May 2023
Iterative Adversarial Attack on Image-guided Story Ending Generation
Iterative Adversarial Attack on Image-guided Story Ending Generation
Youze Wang
Wenbo Hu
Richang Hong
46
3
0
16 May 2023
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Yuyang Zhao
Enze Xie
Lanqing Hong
Zhenguo Li
G. Lee
DiffM
VGen
49
33
0
15 May 2023
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with
  Foundation Models
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
Zhimin Chen
Longlong Jing
Yingwei Li
Bing Li
37
31
0
15 May 2023
A Reproducible Extraction of Training Images from Diffusion Models
A Reproducible Extraction of Training Images from Diffusion Models
Ryan Webster
29
33
0
15 May 2023
Component-aware anomaly detection framework for adjustable and logical
  industrial visual inspection
Component-aware anomaly detection framework for adjustable and logical industrial visual inspection
Tongkun Liu
Bing Li
Xiao Du
Bingke Jiang
Xiao Jin
Liuyi Jin
Zhu Zhao
37
27
0
15 May 2023
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
Peipei Liu
Hong Li
Yimo Ren
Jie Liu
Shuaizong Si
Hongsong Zhu
Limin Sun
31
2
0
15 May 2023
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Le Xue
Ning Yu
Shu Zhen Zhang
Artemis Panagopoulou
Junnan Li
...
Jiajun Wu
Caiming Xiong
Ran Xu
Juan Carlos Niebles
Silvio Savarese
34
118
0
14 May 2023
Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed
  Opportunity
Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity
Raman Dutt
Linus Ericsson
Pedro Sanchez
Sotirios A. Tsaftaris
Timothy M. Hospedales
MedIm
48
50
0
14 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
48
90
0
14 May 2023
How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer
  to Novel Tasks and Healthcare Systems
How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer to Novel Tasks and Healthcare Systems
Cara Van Uden
Jeremy Irvin
Mars Huang
N. Dean
J. Carr
A. Ng
C. Langlotz
OOD
37
1
0
13 May 2023
Consistency Regularization for Domain Generalization with Logit
  Attribution Matching
Consistency Regularization for Domain Generalization with Logit Attribution Matching
Han Gao
Kaican Li
Weiyan Xie
Zhi Lin
Yongxiang Huang
Luning Wang
Caleb Chen Cao
N. Zhang
34
2
0
13 May 2023
Visual Information Extraction in the Wild: Practical Dataset and
  End-to-end Solution
Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution
Jianfeng Kuang
Wei Hua
Dingkang Liang
Mingkun Yang
Deqiang Jiang
Bo Ren
Xiang Bai
40
39
0
12 May 2023
IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level
  Grounding of Images
IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level Grounding of Images
Varuna Krishna
S. Suryavardan
Shreyash Mishra
Sathyanarayanan Ramamoorthy
Parth Patwa
Megha Chakraborty
Aman Chadha
Amitava Das
Amit P. Sheth
VLM
33
3
0
12 May 2023
Self-Chained Image-Language Model for Video Localization and Question
  Answering
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Joey Tianyi Zhou
61
131
0
11 May 2023
Patchwork Learning: A Paradigm Towards Integrative Analysis across
  Diverse Biomedical Data Sources
Patchwork Learning: A Paradigm Towards Integrative Analysis across Diverse Biomedical Data Sources
Suraj Rajendran
Weishen Pan
M. Sabuncu
Yong Chen
Jiayu Zhou
Fei Wang
68
14
0
10 May 2023
Technical Understanding from IML Hands-on Experience: A Study through a
  Public Event for Science Museum Visitors
Technical Understanding from IML Hands-on Experience: A Study through a Public Event for Science Museum Visitors
Wataru Kawabe
Yuri Nakao
Akihisa Shitara
Yusuke Sugano
47
1
0
10 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
  Large Language Models
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
37
40
0
09 May 2023
$2 * n$ is better than $n^2$: Decomposing Event Coreference Resolution
  into Two Tractable Problems
2∗n2 * n2∗n is better than n2n^2n2: Decomposing Event Coreference Resolution into Two Tractable Problems
Shafiuddin Rehan Ahmed
Abhijnan Nath
James H. Martin
Nikhil Krishnaswamy
50
13
0
09 May 2023
SRIL: Selective Regularization for Class-Incremental Learning
SRIL: Selective Regularization for Class-Incremental Learning
Jisu Han
Jaemin Na
Wonjun Hwang
CLL
74
0
0
09 May 2023
Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval
Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval
Shiyin Dong
Mingrui Zhu
N. Wang
Xinbo Gao
VLM
43
3
0
09 May 2023
Comparing Foundation Models using Data Kernels
Comparing Foundation Models using Data Kernels
Brandon Duderstadt
Hayden S. Helm
Carey E. Priebe
34
5
0
09 May 2023
Tomography of Quantum States from Structured Measurements via quantum-aware transformer
Tomography of Quantum States from Structured Measurements via quantum-aware transformer
Hailan Ma
Zhenhong Sun
Daoyi Dong
Chunlin Chen
H. Rabitz
45
3
0
09 May 2023
Learning Summary-Worthy Visual Representation for Abstractive
  Summarization in Video
Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video
Zenan Xu
Xiaojun Meng
Yasheng Wang
Qinliang Su
Zexuan Qiu
Xin Jiang
Qun Liu
38
3
0
08 May 2023
Augmented Large Language Models with Parametric Knowledge Guiding
Augmented Large Language Models with Parametric Knowledge Guiding
Ziyang Luo
Can Xu
Pu Zhao
Xiubo Geng
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
KELM
RALM
45
44
0
08 May 2023
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed
  Multi-Label Visual Recognition
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
Peng Xia
Di Xu
Ming Hu
Lie Ju
Zongyuan Ge
VLM
48
11
0
08 May 2023
Scene Text Recognition with Image-Text Matching-guided Dictionary
Scene Text Recognition with Image-Text Matching-guided Dictionary
Jiajun Wei
Hongjian Zhan
X. Tu
Yue Lu
Umapada Pal
VLM
22
0
0
08 May 2023
Vision Language Pre-training by Contrastive Learning with Cross-Modal
  Similarity Regulation
Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
Chaoya Jiang
Wei Ye
Haiyang Xu
Miang yan
Shikun Zhang
Jie Zhang
Fei Huang
VLM
56
15
0
08 May 2023
Locally Attentional SDF Diffusion for Controllable 3D Shape Generation
Locally Attentional SDF Diffusion for Controllable 3D Shape Generation
Xin-Yang Zheng
Hao Pan
Peng-Shuai Wang
Xin Tong
Yang Liu
H. Shum
55
127
0
08 May 2023
Text-to-Image Diffusion Models can be Easily Backdoored through
  Multimodal Data Poisoning
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
Shengfang Zhai
Yinpeng Dong
Qingni Shen
Shih-Chieh Pu
Yuejian Fang
Hang Su
38
73
0
07 May 2023
"When Words Fail, Emojis Prevail": Generating Sarcastic Utterances with
  Emoji Using Valence Reversal and Semantic Incongruity
"When Words Fail, Emojis Prevail": Generating Sarcastic Utterances with Emoji Using Valence Reversal and Semantic Incongruity
Faria Binte Kader
Nafisa Hossain Nujat
Tasmia Binte Sogir
Mohsinul Kabir
H. Mahmud
Md. Kamrul Hasan
47
1
0
06 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
50
36
0
05 May 2023
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A
  Survey
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey
Yichi Zhang
Rushi Jiao
MedIm
VLM
50
26
0
05 May 2023
Guided Image Synthesis via Initial Image Editing in Diffusion Model
Guided Image Synthesis via Initial Image Editing in Diffusion Model
Jiafeng Mao
Xueting Wang
Kiyoharu Aizawa
DiffM
40
52
0
05 May 2023
LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics
LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics
Shervin Ardeshir
41
0
0
04 May 2023
Previous
123...169170171...209210211
Next