ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
The "Law" of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities
Yongwei Che
Benjamin Eysenbach
68
1
0
20 Jan 2025
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?
Wenxuan Li
Alan Yuille
Zongwei Zhou
MedIm
135
10
0
20 Jan 2025
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
Ruojun Xu
Weijie Xi
Xiaodi Wang
Yongbo Mao
Zach Cheng
DiffM
104
1
0
20 Jan 2025
Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance
Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance
Jin Zhu
Huimin Ma
Jiansheng Chen
Jian Yuan
132
4
0
20 Jan 2025
Dynamic Scene Understanding from Vision-Language Representations
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
485
0
0
20 Jan 2025
IncSAR: A Dual Fusion Incremental Learning Framework for SAR Target Recognition
IncSAR: A Dual Fusion Incremental Learning Framework for SAR Target Recognition
George Karantaidis
Athanasios Pantsios
Y. Kompatsiaris
Symeon Papadopoulos
CLL
125
1
0
20 Jan 2025
Geometric Median (GM) Matching for Robust Data Pruning
Geometric Median (GM) Matching for Robust Data Pruning
Anish Acharya
Inderjit S Dhillon
Sujay Sanghavi
AAML
121
0
0
20 Jan 2025
ACE: Anatomically Consistent Embeddings in Composition and Decomposition
ACE: Anatomically Consistent Embeddings in Composition and Decomposition
Ziyu Zhou
Haozhe Luo
M. Taher
Jiaxuan Pang
Xiaowei Ding
Michael B. Gotway
Jianming Liang
MedIm
137
0
0
20 Jan 2025
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection
Yuanze Li
Haolin Wang
Shihao Yuan
Ming-Yu Liu
Debin Zhao
Yiwen Guo
Chen Xu
Guangming Shi
Wangmeng Zuo
157
33
0
20 Jan 2025
ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction
ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction
Izzeddin Teeti
Aniket Thomas
Munish Monga
S. Kumar
Uddeshya Singh
Andrew Bradley
Biplab Banerjee
Fabio Cuzzolin
106
1
0
20 Jan 2025
Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding
Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding
Kohei Torimi
Ryosuke Yamada
Daichi Otsuka
Kensho Hara
Yuki M. Asano
Hirokatsu Kataoka
Y. Aoki
3DV
106
0
0
20 Jan 2025
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
Chen Cai
Zheng Wang
J. Gao
Wenyang Liu
Ye Lu
Runzhong Zhang
Kim-Hui Yap
CLL
119
2
0
20 Jan 2025
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou
Amine Ouasfi
Vincent Gripon
A. Boukhayma
VLM
146
0
0
19 Jan 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
87
3
0
19 Jan 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Ziyang Chen
Mingxiao Li
Zhongfu Chen
Nan Du
Xiaolong Li
Yuexian Zou
121
1
0
19 Jan 2025
MedFILIP: Medical Fine-grained Language-Image Pre-training
MedFILIP: Medical Fine-grained Language-Image Pre-training
Xinjie Liang
Xiangyu Li
Fanding Li
Jie Jiang
Qing Dong
Wei Wang
Kaidi Wang
Suyu Dong
Gongning Luo
Shuo Li
LM&MAVLMMedIm
144
4
0
18 Jan 2025
Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection
Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection
Yifang Xu
Yunzhuo Sun
Benxiang Zhai
Zien Xie
Youyao Jia
S. Du
102
3
0
18 Jan 2025
Human Activity Recognition in an Open World
Human Activity Recognition in an Open World
D. Prijatelj
Samuel Grieggs
Jin Huang
Dawei Du
Ameya Shringi
Christopher Funk
Adam Kaufman
Eric Robertson
Walter J. Scheirer University of Notre Dame
123
3
0
17 Jan 2025
A General Framework for Inference-time Scaling and Steering of Diffusion Models
A General Framework for Inference-time Scaling and Steering of Diffusion Models
R. Singhal
Zachary Horvitz
Ryan Teehan
Mengye Ren
Zhou Yu
Kathleen McKeown
Rajesh Ranganath
DiffM
130
31
0
17 Jan 2025
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
Divyansh Srivastava
Beatriz Cabrero-Daniel
Christian Berger
VLM
163
15
0
17 Jan 2025
TextureCrop: Enhancing Synthetic Image Detection through Texture-based Cropping
TextureCrop: Enhancing Synthetic Image Detection through Texture-based Cropping
Despina Konstantinidou
C. Koutlis
Symeon Papadopoulos
133
3
0
17 Jan 2025
Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis
Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis
Hongyu Sun
Qiuhong Ke
Yanjie Wang
Wang Chen
Kang Yang
Deying Li
Jianfei Cai
3DPC
180
3
0
17 Jan 2025
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park
Sangdoo Yun
Jin-Hwa Kim
Junho Kim
Geonhui Jang
Yonghyun Jeong
Junghyo Jo
Gayoung Lee
142
19
0
17 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CELM&MAVLM
280
26
0
17 Jan 2025
Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning
Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning
Alain Komaty
Hatef Otroshi-Shahreza
Anjith George
S´ebastien Marcel
AAML
144
2
0
15 Jan 2025
Joint Learning of Depth and Appearance for Portrait Image Animation
Joint Learning of Depth and Appearance for Portrait Image Animation
Xinya Ji
Gaspard Zoss
Prashanth Chandran
Lingchen Yang
Xun Cao
B. Solenthaler
D. Bradley
3DHMDE
113
1
0
15 Jan 2025
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Zhiyong Yang
Pingping Zhang
Huchuan Lu
82
3
0
15 Jan 2025
Detecting Contextual Anomalies by Discovering Consistent Spatial Regions
Detecting Contextual Anomalies by Discovering Consistent Spatial Regions
Zhengye Yang
Richard J. Radke
95
0
0
14 Jan 2025
Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics
Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics
Georgii Gotin
E. Shumitskaya
Anastasia Antsiferova
D. Vatolin
AAML
86
0
0
14 Jan 2025
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
Diego A. Velázquez
Pau Rodríguez López
Sergio Alonso
Josep M. Gonfaus
Jordi Gonzalez
Gerardo Richarte
Javier Marin
Yoshua Bengio
Alexandre Lacoste
98
1
0
14 Jan 2025
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim
Ju He
Qihang Yu
Chenglin Yang
Xiaohui Shen
Suha Kwak
Liang-Chieh Chen
VLM
118
11
0
13 Jan 2025
Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis
Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis
Andrzej D. Dobrzycki
Ana M. Bernardos
Luca Bergesio
Andrzej Pomirski
Daniel Sáez-Trigueros
3DH
117
3
0
13 Jan 2025
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Alejandro Lozano
Min Woo Sun
James Burgess
Liangyu Chen
Jeffrey Nirschl
...
Xiaohan Wang
Yuhui Zhang
Alfred Seunghoon Song
Robert Tibshirani
Serena Yeung-Levy
LM&MAVLMMedIm
144
10
0
13 Jan 2025
A Hessian-informed hyperparameter optimization for differential learning rate
A Hessian-informed hyperparameter optimization for differential learning rate
Shiyun Xu
Zhiqi Bu
Yiliang Zhang
Ian Barnett
99
1
0
12 Jan 2025
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
Zixuan Chen
Jing Huo
Yangtao Chen
Yang Gao
140
4
0
11 Jan 2025
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Xuanle Zhao
Xianzhen Luo
Qi Shi
Chong Chen
Shuo Wang
Wanxiang Che
Zhiyuan Liu
MLLM
114
12
0
11 Jan 2025
CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems
CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems
Haichao Liu
Ruoyu Yao
Wenru Liu
Zhenmin Huang
Shaojie Shen
Jun Ma
66
3
0
10 Jan 2025
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Mamba
125
3
0
10 Jan 2025
Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation
Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation
Minxing Luo
Zixun Xia
L. Chen
Zhenhang Li
Weichao Zeng
Jinqiao Wang
Wentao Cheng
Yaxing Wang
Yu Zhou
Jian Yang
DiffM
132
1
0
10 Jan 2025
TextToucher: Fine-Grained Text-to-Touch Generation
TextToucher: Fine-Grained Text-to-Touch Generation
Jiahang Tu
Hao Fu
Fengyu Yang
Hanbin Zhao
Chao Zhang
Hui Qian
VLMDiffM
130
12
0
10 Jan 2025
AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
Muhammad Awais
Ali Husain Salem Abdulla Alharthi
Amandeep Kumar
Hisham Cholakkal
Rao Muhammad Anwer
VLM
101
5
0
10 Jan 2025
Multi-Task Model Merging via Adaptive Weight Disentanglement
Multi-Task Model Merging via Adaptive Weight Disentanglement
Feng Xiong
Runxi Cheng
Wang Chen
Zhanqiu Zhang
Yiwen Guo
Chun Yuan
Ruifeng Xu
MoMe
189
8
0
10 Jan 2025
MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
Daniele Molino
Francesco Di Feola
E. Faiella
Deborah Fazzini
D. Santucci
Linlin Shen
V. Guarrasi
Paolo Soda
SyDaMedIm
120
1
0
10 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
236
134
0
10 Jan 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
100
9
0
10 Jan 2025
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control
Mengting Wei
Tuomas Varanka
Xingxun Jiang
Huai-Qian Khor
Guoying Zhao
DiffM
96
0
0
10 Jan 2025
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
Aaron Lohner
Francesco Compagno
Jonathan M Francis
A. Oltramari
132
3
0
10 Jan 2025
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Sheng Zhang
Yanbo Xu
Naoto Usuyama
Hanwen Xu
J. Bagga
...
Carlo Bifulco
M. Lungren
Tristan Naumann
Sheng Wang
Hoifung Poon
LM&MAMedIm
227
233
0
10 Jan 2025
FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Yi Yuan
Xubo Liu
Haohe Liu
Mark D. Plumbley
Wenwu Wang
115
9
0
10 Jan 2025
Multi-subject Open-set Personalization in Video Generation
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Yuwei Fang
Kwot Sin Lee
Ivan Skorokhodov
Kfir Aberman
Jun-Yan Zhu
Ming-Hsuan Yang
Sergey Tulyakov
DiffMVGen
172
13
0
10 Jan 2025
Previous
123...131415...333435
Next