ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 9,339 papers shown
Title
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
Zibin Dong
Fei Ni
Yifu Yuan
Yinchuan Li
Jianye Hao
24
0
0
15 May 2025
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
Bin-Bin Gao
Yue Zhu
Jiangtao Yan
Y. Cai
W. Zhang
Meng Wang
Jun Liu
Y. Liu
L. Wang
Chengjie Wang
VLM
38
0
0
15 May 2025
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation
Yanbo Ding
Xirui Hu
Zhizhi Guo
Y. Wang
DiffM
VGen
31
0
0
15 May 2025
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
Daniel A. P. Oliveira
D. Matos
VGen
27
0
0
15 May 2025
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan
S.
Ankur Aggarwal
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
VLM
27
0
0
15 May 2025
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Yiwen Liu
Jessica Bader
Jae Myung Kim
DiffM
16
0
0
15 May 2025
Modeling Saliency Dataset Bias
Modeling Saliency Dataset Bias
Matthias Kümmerer
Harneet Khanuja
Matthias Bethge
21
0
0
15 May 2025
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
OffRL
VLM
27
0
0
15 May 2025
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
Jie Zhu
Jirong Zha
Ding Li
Leye Wang
31
0
0
15 May 2025
MorphGuard: Morph Specific Margin Loss for Enhancing Robustness to Face Morphing Attacks
MorphGuard: Morph Specific Margin Loss for Enhancing Robustness to Face Morphing Attacks
Iurii Medvedev
Nuno Gonçalves
AAML
CVBM
45
0
0
15 May 2025
Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning
Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning
Milan Ganai
Rohan Sinha
Christopher Agia
D. Morton
Marco Pavone
OffRL
LRM
AI4CE
27
0
0
15 May 2025
Style Customization of Text-to-Vector Generation with Image Diffusion Priors
Style Customization of Text-to-Vector Generation with Image Diffusion Priors
P. Zhang
Nanxuan Zhao
Jing Liao
DiffM
24
0
0
15 May 2025
MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Y. Wang
Shuai Xu
Xuelin Zhu
Y. Li
VLM
11
0
0
15 May 2025
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge
Ranjan Sapkota
Konstantinos I Roumeliotis
Manoj Karkee
AI4TS
24
0
0
15 May 2025
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang
Boyang Zheng
Xichen Pan
Sayak Paul
Saining Xie
24
0
0
15 May 2025
Demystifying AI Agents: The Final Generation of Intelligence
Demystifying AI Agents: The Final Generation of Intelligence
Kevin J McNamara
Rhea Pritham Marpu
24
0
0
15 May 2025
ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data
ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data
Chengsen Wang
Qi Qi
Zhongwen Rao
Lujia Pan
Jingyu Wang
Jianxin Liao
AI4TS
19
0
0
15 May 2025
Coherent Language Reconstruction from Brain Recordings with Flexible Multi-Modal Input Stimuli
Coherent Language Reconstruction from Brain Recordings with Flexible Multi-Modal Input Stimuli
Chunyu Ye
Shaonan Wang
AI4CE
19
0
0
15 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François G. Germain
Michael J. Jones
Moitreya Chatterjee
18
0
0
14 May 2025
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
Bin-Bin Gao
31
4
0
14 May 2025
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis
B. Ke
Kevin Qu
T. Wang
Nando Metzger
Shengyu Huang
Bo Li
Anton Obukhov
Konrad Schindler
DiffM
VLM
22
0
0
14 May 2025
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
Guan Gui
Bin-Bin Gao
J. Liu
Chengjie Wang
Y. Wu
DiffM
26
0
0
14 May 2025
Virtual Dosimetrists: A Radiotherapy Training "Flight Simulator"
Virtual Dosimetrists: A Radiotherapy Training "Flight Simulator"
S. Gay
T. Netherton
Barbara Marquez
Raymond P. Mumme
Mary P. Gronberg
Brent Parker
Chelsea Pinnix
Sanjay Shete
Carlos Cardenas
Laurence E. Court
19
0
0
14 May 2025
Unfettered Forceful Skill Acquisition with Physical Reasoning and Coordinate Frame Labeling
William Xie
Max Conway
Yutong Zhang
N. Correll
LM&Ro
LRM
32
0
0
14 May 2025
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
Bin-Bin Gao
VLM
22
0
0
14 May 2025
A Multimodal Multi-Agent Framework for Radiology Report Generation
A Multimodal Multi-Agent Framework for Radiology Report Generation
Ziruo Yi
Ting Xiao
Mark V. Albert
MedIm
21
0
0
14 May 2025
MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
Siyuan Yan
X. Li
Ming Hu
Yiwen Jiang
Zhen Yu
Zongyuan Ge
MedIm
VLM
23
0
0
14 May 2025
Explainability Through Human-Centric Design for XAI in Lung Cancer Detection
Explainability Through Human-Centric Design for XAI in Lung Cancer Detection
Amy Rafferty
Rishi Ramaesh
Ajitha Rajan
16
0
0
14 May 2025
EnerVerse-AC: Envisioning Embodied Environments with Action Condition
Yuxin Jiang
Shengcong Chen
Siyuan Huang
Liliang Chen
Pengfei Zhou
...
Xindong He
Chiming Liu
Hongsheng Li
Maoqing Yao
Guanghui Ren
11
0
0
14 May 2025
An Initial Exploration of Default Images in Text-to-Image Generation
An Initial Exploration of Default Images in Text-to-Image Generation
Hannu Simonen
Atte Kiviniemi
Jonas Oppenlaender
VLM
18
0
0
14 May 2025
Recent Advances in Medical Imaging Segmentation: A Survey
Recent Advances in Medical Imaging Segmentation: A Survey
Fares Bougourzi
Abdenour Hadid
OOD
44
0
0
14 May 2025
Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
Yili He
Yan Zhu
Peiyao Fu
Ruijie Yang
Tianyi Chen
Zhihua Wang
Quanlin Li
Pinghong Zhou
X. J. Yang
Shuo Wang
MedIm
VLM
26
0
0
14 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
22
0
0
14 May 2025
Text-driven Motion Generation: Overview, Challenges and Directions
Text-driven Motion Generation: Overview, Challenges and Directions
Ali Rida Sahili
Najett Neji
Hedi Tabia
VGen
33
0
0
14 May 2025
Dyadic Mamba: Long-term Dyadic Human Motion Synthesis
Dyadic Mamba: Long-term Dyadic Human Motion Synthesis
Julian Tanke
Takashi Shibuya
Kengo Uchida
Koichi Saito
Yuki Mitsufuji
Mamba
42
0
0
14 May 2025
Promoting SAM for Camouflaged Object Detection via Selective Key Point-based Guidance
Promoting SAM for Camouflaged Object Detection via Selective Key Point-based Guidance
Guoying Liang
Su Yang
26
0
0
14 May 2025
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models
Hongyang Wang
Yichen Shi
Zhuofu Tao
Yuhao Gao
L. Zhang
Xun Lin
Jun Feng
Xiaochen Yuan
Zitong Yu
Xiaochun Cao
CVBM
AAML
20
0
0
14 May 2025
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
Muzammil Behzad
VLM
23
0
0
14 May 2025
Controllable Image Colorization with Instance-aware Texts and Masks
Controllable Image Colorization with Instance-aware Texts and Masks
Yanru An
Ling Gui
Qiang Hu
Chunlei Cai
Tianxiao Ye
Xiaoyun Zhang
Yanfeng Wang
DiffM
34
0
0
13 May 2025
Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
Anle Ke
Xu Zhang
Tong Chen
Ming-Tse Lu
Chao Zhou
Jiawen Gu
Zhan Ma
DiffM
30
0
0
13 May 2025
Decoding Neighborhood Environments with Large Language Models
Decoding Neighborhood Environments with Large Language Models
Andrew Cart
Shaohu Zhang
Melanie Escue
Xugui Zhou
Haitao Zhao
Prashanth BusiReddyGari
Beiyu Lin
Shuang Li
21
0
0
13 May 2025
SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model
SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model
Zhanjie Zhang
Quanwei Zhang
Junsheng Luan
Mengyuan Yang
Yun Wang
Lei Zhao
21
0
0
13 May 2025
Parameter-Efficient Fine-Tuning of Vision Foundation Model for Forest Floor Segmentation from UAV Imagery
Parameter-Efficient Fine-Tuning of Vision Foundation Model for Forest Floor Segmentation from UAV Imagery
Mohammad Wasil
Ahmad Drak
Brennan Penfold
Ludovico Scarton
Maximilian Johenneken
Alexander Asteroth
Sebastian Houben
19
0
0
13 May 2025
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Zhe Li
Hadrien Reynaud
Bernhard Kainz
DD
45
0
0
13 May 2025
DSADF: Thinking Fast and Slow for Decision Making
DSADF: Thinking Fast and Slow for Decision Making
Alex Zhihao Dou
Dongfei Cui
Jun Yan
W. Wang
Benteng Chen
Haoming Wang
Zeke Xie
Shufei Zhang
OffRL
38
0
0
13 May 2025
Large Language Models for Computer-Aided Design: A Survey
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
38
0
0
13 May 2025
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
Avihai Giuili
Rotem Atari
A. Sintov
VLM
22
0
0
13 May 2025
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
36
0
0
13 May 2025
Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting
Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting
Zheang Huai
Hui Tang
Yi Li
Z. Chen
Xiaomeng Li
VLM
33
0
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
49
0
0
13 May 2025
1234...185186187
Next