ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 9,564 papers shown
Title
Feasibility with Language Models for Open-World Compositional Zero-Shot Learning
Feasibility with Language Models for Open-World Compositional Zero-Shot Learning
Jae Myung Kim
Stephan Alaniz
Cordelia Schmid
Zeynep Akata
12
0
0
16 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TS
LRM
12
0
0
16 May 2025
Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges
Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges
Yuan Zhang
Xinfeng Zhang
Xiaoming Qi Xinyu Wu
Feng Chen
Guanyu Yang
Huazhu Fu
MedIm
LM&MA
AI4CE
22
0
0
16 May 2025
Geofenced Unmanned Aerial Robotic Defender for Deer Detection and Deterrence (GUARD)
Geofenced Unmanned Aerial Robotic Defender for Deer Detection and Deterrence (GUARD)
Ebasa Temesgen
Mario Jerez
Greta Brown
Graham Wilson
Sree Ganesh Lalitaditya Divakarla
Sarah Boelter
Oscar Nelson
Robert McPherson
Maria L. Gini
17
0
0
16 May 2025
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
Yifei He
Siqi Zeng
Yuzheng Hu
Rui Yang
Tong Zhang
Han Zhao
MoMe
ALM
19
0
0
16 May 2025
Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild
Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild
Derek Ming Siang Tan
Shailesh
Boyang Liu
Alok Raj
Qi Xuan Ang
...
Tanishq Duhan
Jimmy Chiun
Yuhong Cao
Florian Shkurti
Guillaume Sartoretti
12
0
0
16 May 2025
GeoMM: On Geodesic Perspective for Multi-modal Learning
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
17
0
0
16 May 2025
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Zihan Wang
Seungjun Lee
Gim Hee Lee
VGen
7
0
0
16 May 2025
DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models
DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models
Giulia Bertazzini
Daniele Baracchi
D. Shullani
Isao Echizen
A. Piva
19
0
0
16 May 2025
Pseudo-Label Quality Decoupling and Correction for Semi-Supervised Instance Segmentation
Pseudo-Label Quality Decoupling and Correction for Semi-Supervised Instance Segmentation
Jianghang Lin
Yilin Lu
Yunhang Shen
Chaoyang Zhu
Shengchuan Zhang
Liujuan Cao
Rongrong Ji
ISeg
26
0
0
16 May 2025
Context-Aware Probabilistic Modeling with LLM for Multimodal Time Series Forecasting
Context-Aware Probabilistic Modeling with LLM for Multimodal Time Series Forecasting
Yueyang Yao
Jiajun Li
Xingyuan Dai
MengMeng Zhang
Xiaoyan Gong
Fei-Yue Wang
Yisheng Lv
AI4TS
22
0
0
16 May 2025
NeuSEditor: From Multi-View Images to Text-Guided Neural Surface Edits
NeuSEditor: From Multi-View Images to Text-Guided Neural Surface Edits
Nail Ibrahimli
Julian F. P. Kooij
Liangliang Nan
7
0
0
16 May 2025
Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
Zongye Zhang
Bohan Kong
Qingjie Liu
Yuanda Wang
DiffM
14
0
0
16 May 2025
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Raghuveer Thirukovalluru
Rui Meng
Lingjuan Lyu
K. Krishnamoorthi
Mingyi Su
Ping Nie
Semih Yavuz
Yingbo Zhou
Wenhu Chen
Bhuwan Dhingra
17
0
0
16 May 2025
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
Alexey Magay
Dhurba Tripathi
Yu Hao
Yi Fang
12
0
0
16 May 2025
ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization
ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization
Bo Du
Xuekang Zhu
Xiaochen Ma
Chenfan Qu
Kaiwen Feng
Zhe Yang
Chi-Man Pun
Jian Liu
Jizhe Zhou
17
0
0
16 May 2025
Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs
Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs
Abhishek Dey
Aabha Bothera
Samhita Sarikonda
Rishav Aryan
Sanjay Kumar Podishetty
Akshay Havalgi
Gaurav Singh
Saurabh Srivastava
7
0
0
16 May 2025
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution
Junyi Yuan
Jian Zhang
Fangyu Wu
Dongming Lu
Huanda Lu
Qiufeng Wang
12
0
0
16 May 2025
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
Feiran Li
Qianqian Xu
Shilong Bao
Zhiyong Yang
Xiaochun Cao
Qingming Huang
DiffM
12
0
0
16 May 2025
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Lingjuan Lyu
Shengfang Zhai
Mingzhe Du
Y. Chen
Tri Cao
...
Zhaoxin Fan
Kun Wang
Junfeng Fang
Jiaheng Zhang
Bryan Hooi
OffRL
LRM
7
0
0
16 May 2025
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
Bin-Bin Gao
Yue Zhu
Jiangtao Yan
Y. Cai
Weinan Zhang
Meng Wang
Jun Liu
Lingjuan Lyu
L. Wang
Chengjie Wang
VLM
41
0
0
15 May 2025
MorphGuard: Morph Specific Margin Loss for Enhancing Robustness to Face Morphing Attacks
MorphGuard: Morph Specific Margin Loss for Enhancing Robustness to Face Morphing Attacks
Iurii Medvedev
Nuno Gonçalves
AAML
CVBM
45
0
0
15 May 2025
Style Customization of Text-to-Vector Generation with Image Diffusion Priors
Style Customization of Text-to-Vector Generation with Image Diffusion Priors
P. Zhang
Nanxuan Zhao
Jing Liao
DiffM
28
0
0
15 May 2025
Coherent Language Reconstruction from Brain Recordings with Flexible Multi-Modal Input Stimuli
Coherent Language Reconstruction from Brain Recordings with Flexible Multi-Modal Input Stimuli
Chunyu Ye
Shaonan Wang
AI4CE
19
0
0
15 May 2025
ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data
ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data
Chengsen Wang
Qi Qi
Zhongwen Rao
Lujia Pan
Jingyu Wang
Jianxin Liao
AI4TS
19
0
0
15 May 2025
Mitigate Language Priors in Large Vision-Language Models by Cross-Images Contrastive Decoding
Mitigate Language Priors in Large Vision-Language Models by Cross-Images Contrastive Decoding
Jianfei Zhao
Feng Zhang
X. Sun
Chong Feng
MLLM
28
0
0
15 May 2025
Modeling Saliency Dataset Bias
Modeling Saliency Dataset Bias
Matthias Kümmerer
Harneet Khanuja
Matthias Bethge
26
0
0
15 May 2025
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
OffRL
VLM
27
0
0
15 May 2025
MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence
MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence
Chonghan Liu
Haoran Wang
Felix Henry
Pu Miao
Yajie Zhang
Yu Zhao
Peiran Wu
VLM
26
0
0
15 May 2025
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang
Boyang Zheng
Xichen Pan
Sayak Paul
Saining Xie
29
0
0
15 May 2025
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
Jie Zhu
Jirong Zha
Ding Li
Leye Wang
31
0
0
15 May 2025
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
Ranjan Sapkota
Konstantinos I Roumeliotis
Manoj Karkee
AI4TS
24
0
0
15 May 2025
MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Yuhui Wang
Shuai Xu
Xuelin Zhu
Yanggeng Li
VLM
13
0
0
15 May 2025
CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
Ziyang Ou
VLM
9
0
0
15 May 2025
MOSAIC: A Multi-View 2.5D Organ Slice Selector with Cross-Attentional Reasoning for Anatomically-Aware CT Localization in Medical Organ Segmentation
MOSAIC: A Multi-View 2.5D Organ Slice Selector with Cross-Attentional Reasoning for Anatomically-Aware CT Localization in Medical Organ Segmentation
Hania Ghouse
Muzammil Behzad
12
0
0
15 May 2025
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Yiwen Liu
Jessica Bader
Jae Myung Kim
DiffM
16
0
0
15 May 2025
IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation
IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation
Amritanshu Tiwari
Cherish Puniani
Kaustubh Sharma
Ojasva Nema
DiffM
12
0
0
15 May 2025
Demystifying AI Agents: The Final Generation of Intelligence
Demystifying AI Agents: The Final Generation of Intelligence
Kevin J McNamara
Rhea Pritham Marpu
29
0
0
15 May 2025
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation
Yanbo Ding
Xirui Hu
Zhizhi Guo
Yuhui Wang
DiffM
VGen
31
0
0
15 May 2025
GA3CE: Unconstrained 3D Gaze Estimation with Gaze-Aware 3D Context Encoding
GA3CE: Unconstrained 3D Gaze Estimation with Gaze-Aware 3D Context Encoding
Yuki Kawana
Shintaro Shiba
Quan Kong
N. Kobori
7
0
0
15 May 2025
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan
Shuifa Sun
Ankur Aggarwal
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
VLM
27
0
0
15 May 2025
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
Zibin Dong
Fei Ni
Yifu Yuan
Yinchuan Li
Jianye Hao
26
0
0
15 May 2025
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
Daniel A. P. Oliveira
D. Matos
VGen
27
0
0
15 May 2025
Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning
Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning
Milan Ganai
Rohan Sinha
Christopher Agia
D. Morton
Marco Pavone
OffRL
LRM
AI4CE
27
0
0
15 May 2025
Explainability Through Human-Centric Design for XAI in Lung Cancer Detection
Explainability Through Human-Centric Design for XAI in Lung Cancer Detection
Amy Rafferty
Rishi Ramaesh
Ajitha Rajan
16
0
0
14 May 2025
Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios
Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios
Huafeng Shi
Jianzhong Liang
Rongchang Xie
Xian Wu
Cheng Chen
Chang Liu
VGen
17
0
0
14 May 2025
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
Guan Gui
Bin-Bin Gao
Xiaozhong Liu
Chengjie Wang
Y. Wu
DiffM
31
0
0
14 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Germani Elodie
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Albarqouni Shadi
AI4CE
17
0
0
14 May 2025
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models
Hongyang Wang
Yichen Shi
Zhuofu Tao
Yuhao Gao
L. Zhang
Xun Lin
Jun Feng
Xiaochen Yuan
Zitong Yu
Xiaochun Cao
CVBM
AAML
25
0
0
14 May 2025
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
Bin-Bin Gao
VLM
25
0
0
14 May 2025
1234...190191192
Next