Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 9,583 papers shown
Title
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
Bin-Bin Gao
34
4
0
14 May 2025
Dyadic Mamba: Long-term Dyadic Human Motion Synthesis
Julian Tanke
Takashi Shibuya
Kengo Uchida
Koichi Saito
Yuki Mitsufuji
Mamba
47
0
0
14 May 2025
An Initial Exploration of Default Images in Text-to-Image Generation
Hannu Simonen
Atte Kiviniemi
Jonas Oppenlaender
VLM
23
0
0
14 May 2025
Virtual Dosimetrists: A Radiotherapy Training "Flight Simulator"
S. Gay
Tucker Netherton
Barbara Marquez
Raymond P. Mumme
Mary P. Gronberg
Brent Parker
Chelsea Pinnix
Sanjay Shete
Carlos Cardenas
Laurence Court
19
0
0
14 May 2025
Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios
Huafeng Shi
Jianzhong Liang
Rongchang Xie
Xian Wu
Cheng Chen
Chang Liu
VGen
17
0
0
14 May 2025
MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
Siyuan Yan
X. Li
Ming Hu
Yiwen Jiang
Zhen Yu
Zongyuan Ge
MedIm
VLM
28
0
0
14 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
21
0
0
14 May 2025
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
Bin-Bin Gao
VLM
25
0
0
14 May 2025
Explainability Through Human-Centric Design for XAI in Lung Cancer Detection
Amy Rafferty
Rishi Ramaesh
Ajitha Rajan
16
0
0
14 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
30
0
0
14 May 2025
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models
Hongyang Wang
Yichen Shi
Zhuofu Tao
Yuhao Gao
L. Zhang
Xun Lin
Jun Feng
Xiaochen Yuan
Zitong Yu
Xiaochun Cao
CVBM
AAML
25
0
0
14 May 2025
A Multimodal Multi-Agent Framework for Radiology Report Generation
Ziruo Yi
Ting Xiao
Mark V. Albert
MedIm
26
0
0
14 May 2025
Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
Yili He
Yan Zhu
Peiyao Fu
Ruijie Yang
Tianyi Chen
Zhihua Wang
Quanlin Li
Pinghong Zhou
X. J. Yang
Shuo Wang
MedIm
VLM
28
0
0
14 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Germani Elodie
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Albarqouni Shadi
AI4CE
17
0
0
14 May 2025
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
Muzammil Behzad
VLM
25
0
0
14 May 2025
Unfettered Forceful Skill Acquisition with Physical Reasoning and Coordinate Frame Labeling
William Xie
Max Conway
Yutong Zhang
N. Correll
LM&Ro
LRM
35
0
0
14 May 2025
Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World
Yuran Wang
Yingping Liang
Ying Fu
26
0
0
13 May 2025
Parameter-Efficient Fine-Tuning of Vision Foundation Model for Forest Floor Segmentation from UAV Imagery
Mohammad Wasil
Ahmad Drak
Brennan Penfold
Ludovico Scarton
Maximilian Johenneken
Alexander Asteroth
Sebastian Houben
19
0
0
13 May 2025
Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
Anle Ke
Xu Zhang
Tong Chen
Ming-Tse Lu
Chao Zhou
Jiawen Gu
Zhan Ma
DiffM
30
0
0
13 May 2025
Visual Image Reconstruction from Brain Activity via Latent Representation
Y. Kamitani
Misato Tanaka
Ken Shirakawa
23
0
0
13 May 2025
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Yangyi Chen
Hao Peng
Tong Zhang
Heng Ji
VLM
28
0
0
13 May 2025
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
38
0
0
13 May 2025
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
Wenxuan Ma
Xiaoge Cao
Yuhui Zhang
Chaofan Zhang
Shaobo Yang
Peng Hao
Bin Fang
Yinghao Cai
Shaowei Cui
Shuo Wang
33
0
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
49
0
0
13 May 2025
Decoding Neighborhood Environments with Large Language Models
Andrew Cart
Shaohu Zhang
Melanie Escue
Xugui Zhou
Haitao Zhao
Prashanth BusiReddyGari
Beiyu Lin
Shuang Li
21
0
0
13 May 2025
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
Avihai Giuili
Rotem Atari
A. Sintov
VLM
27
0
0
13 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
31
0
0
13 May 2025
Behind Maya: Building a Multilingual Vision Language Model
Nahid Alam
Karthik Reddy Kanjula
Surya Guthikonda
Timothy Chung
Bala Krishna S Vegesna
...
Isha Chaturvedi
Genta Indra Winata
Ashvanth.S
Snehanshu Mukherjee
Alham Fikri Aji
MLLM
VLM
30
0
0
13 May 2025
Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting
Zheang Huai
Hui Tang
Yi Li
Zhengzhang Chen
Xiaomeng Li
VLM
33
0
0
13 May 2025
Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities
Jueqing Lu
Yuanyuan Qi
Xiaohao Yang
Shujie Zhou
Lan Du
29
0
0
13 May 2025
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
38
0
0
13 May 2025
Controllable Image Colorization with Instance-aware Texts and Masks
Yanru An
Ling Gui
Qiang Hu
Chunlei Cai
Tianxiao Ye
Xiaoyun Zhang
Yanfeng Wang
DiffM
34
0
0
13 May 2025
SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model
Zhanjie Zhang
Quanwei Zhang
Junsheng Luan
Mengyuan Yang
Yun Wang
Lei Zhao
21
0
0
13 May 2025
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Zhe Li
Hadrien Reynaud
Bernhard Kainz
DD
45
0
0
13 May 2025
DSADF: Thinking Fast and Slow for Decision Making
Alex Zhihao Dou
Dongfei Cui
Jun Yan
W. Wang
Benteng Chen
Haoming Wang
Zeke Xie
Shufei Zhang
OffRL
41
0
0
13 May 2025
Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models
Songlin Dong
Chenhao Ding
Jiangyang Li
Jizhou Han
Qiang Wang
Yuhang He
Yihong Gong
CLL
VLM
37
0
0
12 May 2025
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
Khurram Mazher
Saad Bin Nasir
MQ
47
0
0
12 May 2025
Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection
Yuqi Cheng
Yunkang Cao
Dongfang Wang
Nong Sang
Wenlong Li
34
1
0
12 May 2025
Towards SFW sampling for diffusion models via external conditioning
Camilo Carvajal Reyes
J. Fontbona
Felipe A. Tobar
DiffM
34
0
0
12 May 2025
Incomplete In-context Learning
Wenqiang Wang
Yangshijie Zhang
36
0
0
12 May 2025
SLAG: Scalable Language-Augmented Gaussian Splatting
Laszlo Szilagyi
Francis Engelmann
Jeannette Bohg
3DGS
47
0
0
12 May 2025
MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
Aybora Koksal
Aydin Alatan
LRM
24
0
0
12 May 2025
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
Hongkun Dou
Zeyu Li
Xingyu Jiang
Hao Li
Lijun Yang
Wen Yao
Yue Deng
DiffM
38
0
0
12 May 2025
Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection
Hongda Qin
Xiao Lu
Zhiyong Wei
Yihong Cao
Kailun Yang
Ningjiang Chen
ObjD
MLLM
VLM
31
0
0
12 May 2025
Position: Restructuring of Categories and Implementation of Guidelines Essential for VLM Adoption in Healthcare
Amara Tariq
Rimita Lahiri
Charles Kahn
Imon Banerjee
26
0
0
12 May 2025
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie
Zequn Zeng
Hao Zhang
Yucheng Ding
Yuxiang Wang
Zhengjue Wang
Bo Chen
Hongwei Liu
OT
33
0
0
12 May 2025
No Query, No Access
Luu Anh Tuan
Siyuan Liang
Yuhui Zhang
Xiaojun Jia
Hao Lin
Xiaochun Cao
AAML
26
0
0
12 May 2025
FLUXSynID: A Framework for Identity-Controlled Synthetic Face Generation with Document and Live Images
Raul Ismayilov
Dzemila Sero
Luuk Spreeuwers
29
0
0
12 May 2025
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue
Jie Wu
Yu Gao
Fangyuan Kong
Lingting Zhu
...
Zhiheng Liu
Wei Liu
Qiushan Guo
Weilin Huang
Ping Luo
EGVM
VGen
52
0
0
12 May 2025
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via
D
\mathbf{\texttt{D}}
D
ual-
H
\mathbf{\texttt{H}}
H
ead
O
\mathbf{\texttt{O}}
O
ptimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
57
0
0
12 May 2025
Previous
1
2
3
4
5
...
190
191
192
Next