ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAGLRM
190
18
0
16 Aug 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
117
14
0
15 Aug 2024
Attention-Guided Perturbation for Unsupervised Image Anomaly Detection
Attention-Guided Perturbation for Unsupervised Image Anomaly Detection
Tingfeng Huang
Yuxuan Cheng
Jingbo Xia
Rui Yu
Yuxuan Cai
Jinhai Xiang
Xinwei He
AAML
241
0
0
14 Aug 2024
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Seung Hyun Lee
Junjie Ke
Yinxiao Li
Junfeng He
Steven Hickson
...
Irfan Essa
Sangpil Kim
Ming-Hsuan Yang
Irfan Essa
Feng Yang
VLM
101
1
0
14 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
164
8
0
13 Aug 2024
A Review of Pseudo-Labeling for Computer Vision
A Review of Pseudo-Labeling for Computer Vision
Patrick Kage
Jay C. Rothenberger
Pavlos Andreadis
Dimitrios I. Diochnos
VLM
103
7
0
13 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
L. Wang
Jianwu Dang
J. Tao
AI4TS
90
0
0
11 Aug 2024
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning
Yuze Zhao
Jintao Huang
Jinghan Hu
Xingjun Wang
Yunlin Mao
...
Zhikai Wu
Baole Ai
Ang Wang
Wenmeng Zhou
Yingda Chen
101
46
0
10 Aug 2024
Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model
Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model
Jaehyuk Heo
Pilsung Kang
VLM
76
1
0
09 Aug 2024
HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts
HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts
Hongjun Wang
S. Vaze
Kai Han
144
5
0
08 Aug 2024
Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services
Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services
Shaopeng Fu
Xuexue Sun
Ke Qing
Tianhang Zheng
Di Wang
AAMLMIACVSILM
113
0
0
05 Aug 2024
VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments
VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments
Daeun Song
Jing Liang
Xuesu Xiao
Dinesh Manocha
96
6
0
05 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
151
59
0
05 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLMMLLM
170
31
0
04 Aug 2024
A new approach for encoding code and assisting code understanding
A new approach for encoding code and assisting code understanding
Mengdan Fan
Changde Du
Haiyan Zhao
Zhi Jin
137
0
0
01 Aug 2024
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion
Manuel Kansy
Jacek Naruniec
Christopher Schroers
Markus Gross
Romann M. Weber
DiffMVGen
118
4
0
01 Aug 2024
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Danfeng Guo
Sumitaka Honji
LRM
144
2
0
31 Jul 2024
Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
Jinfa Huang
Jinsheng Pan
Zhongwei Wan
Hanjia Lyu
Jiebo Luo
102
5
0
30 Jul 2024
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
Hunmin Yang
Jongoh Jeong
Kuk-Jin Yoon
AAMLVLM
164
5
0
30 Jul 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
77
1
0
30 Jul 2024
Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks
Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks
Yunfeng Diao
Na Zhai
Changtao Miao
Xun Yang
Meng Wang
Xun Yang
Meng Wang
AAML
143
2
0
30 Jul 2024
Overcoming Uncertain Incompleteness for Robust Multimodal Sequential Diagnosis Prediction via Curriculum Data Erasing Guided Knowledge Distillation
Overcoming Uncertain Incompleteness for Robust Multimodal Sequential Diagnosis Prediction via Curriculum Data Erasing Guided Knowledge Distillation
Heejoon Koo
120
0
0
28 Jul 2024
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu
Zhengyao Fang
Pengyuan Lyu
Chengquan Zhang
Fanglin Chen
Guangming Lu
Wenjie Pei
140
3
0
28 Jul 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
121
0
0
28 Jul 2024
Parameter-Efficient Fine-Tuning via Circular Convolution
Parameter-Efficient Fine-Tuning via Circular Convolution
Aochuan Chen
Jiashun Cheng
Zijing Liu
Ziqi Gao
Fugee Tsung
Yu-Feng Li
Jia Li
134
3
0
27 Jul 2024
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints
Lei Guo
Wei Chen
Yuxuan Sun
Bo Ai
Nikolaos Pappas
T. Quek
DiffM
123
6
0
26 Jul 2024
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
Zhengbo Wang
Jian Liang
Ran He
Zilei Wang
Tieniu Tan
158
28
0
25 Jul 2024
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
Yiming Xie
Chun-Han Yao
Vikram S. Voleti
Huaizu Jiang
Varun Jampani
VGen
126
47
0
24 Jul 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
183
1
0
23 Jul 2024
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
Yingxue Xu
Yihui Wang
Fengtao Zhou
Jiabo Ma
Shu Yang
...
Anjia Han
Ronald Cheong Kin Chan
Li Liang
Xiuming Zhang
Hao Chen
117
22
0
22 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
158
9
0
22 Jul 2024
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Zheng Chong
Xiao Dong
Haoxiang Li
Shiyue Zhang
Wenqing Zhang
Xujie Zhang
Hanqing Zhao
D. Jiang
Xiaodan Liang
DiffM
125
24
0
21 Jul 2024
Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation
Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation
Liwen Sun
James Zhao
Megan Han
Chenyan Xiong
MedIm
111
12
0
21 Jul 2024
A Comprehensive Review of Few-shot Action Recognition
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
155
3
0
20 Jul 2024
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
...
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
VGenDiffM
191
50
0
17 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
113
2
0
17 Jul 2024
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
Junseo Park
Hyeryung Jang
268
1
0
17 Jul 2024
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
Huiguo He
Huan Yang
Zixi Tuo
Yuan Zhou
Qiuyue Wang
Yuhang Zhang
Zeyu Liu
Wenhao Huang
Hongyang Chao
Jian Yin
DiffMVGen
170
17
0
17 Jul 2024
Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection
Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection
Ye Jiang
Yimin Wang
MLLM
128
1
0
16 Jul 2024
FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries
FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries
Yuqi Jiang
Xudong Lu
Qian Jin
Qi Sun
Hanming Wu
Cheng Zhuo
113
7
0
15 Jul 2024
Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
Mang Ning
A. A. Salah
Itir Onal Ertugrul
CVBM
165
5
0
15 Jul 2024
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Phillip Mueller
Lars Mikelsons
AI4CE
105
3
0
15 Jul 2024
Surgical Text-to-Image Generation
Surgical Text-to-Image Generation
C. Nwoye
Rupak Bose
K. Elgohary
Lorenzo Arboit
Giorgio Carlino
Joël L. Lavanchy
Pietro Mascagni
N. Padoy
MedIm
122
4
0
12 Jul 2024
Enrich the content of the image Using Context-Aware Copy Paste
Enrich the content of the image Using Context-Aware Copy Paste
Qiushi Guo
VLM
179
0
0
11 Jul 2024
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Zijie Yue
Miaojing Shi
Hanli Wang
Shuai Ding
Qijun Chen
Shanlin Yang
92
0
0
11 Jul 2024
Video-to-Audio Generation with Hidden Alignment
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffMVGen
88
12
0
10 Jul 2024
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab
M. Maruf
Arka Daw
Harish Babu Manogaran
Abhilash Neog
...
Paula Mabee
Wasila Dahdul
Anuj Karpatne
Wasila M Dahdul
Anuj Karpatne
201
4
0
10 Jul 2024
RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios
RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios
Liming Zheng
Feng Yan
Fanfan Liu
Chengjian Feng
Zhuoliang Kang
Lin Ma
126
2
0
09 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
130
5
0
09 Jul 2024
Sequential Contrastive Audio-Visual Learning
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serrà
91
2
0
08 Jul 2024
Previous
123...242526...333435
Next