Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 9,822 papers shown
Title
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
Q. Yang
Yuan Yao
Miaomiao Cui
Liefeng Bo
VLM
61
0
0
30 Apr 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
28
0
0
30 Apr 2025
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
Zhifu Zhao
Hanyang Hua
Jiajian Li
Shaoxin Wu
Fu Li
Yangtao Zhou
Yang Li
DiffM
68
0
0
30 Apr 2025
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
78
0
0
30 Apr 2025
GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers
Xinyu Li
Qi Yao
Yanjie Wang
DiffM
48
0
0
30 Apr 2025
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning
Siyi Chen
Yimeng Zhang
Sijia Liu
Q. Qu
AAML
165
0
0
30 Apr 2025
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Yan Shu
Weichao Zeng
Fangmin Zhao
Zeyu Chen
Zhiyu Li
...
Paolo Rota
Xiang Bai
Lianwen Jin
Xu-Cheng Yin
N. Sebe
CoGe
64
0
0
30 Apr 2025
Erased but Not Forgotten: How Backdoors Compromise Concept Erasure
Jonas Henry Grebe
Tobias Braun
Marcus Rohrbach
Anna Rohrbach
AAML
85
0
0
29 Apr 2025
Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection
Ziqing Fan
Siyuan Du
Shengchao Hu
Pingjie Wang
Li Shen
Wenjie Qu
Dacheng Tao
Yucheng Wang
41
2
0
29 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
77
0
0
29 Apr 2025
FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models
Mainak Singha
Subhankar Roy
Sarthak Mehrotra
Ankit Jha
Moloud Abdar
Biplab Banerjee
Elisa Ricci
VLM
VPVLM
119
0
0
29 Apr 2025
Fine Grain Classification: Connecting Meta using Cross-Contrastive pre-training
Sumit Mamtani
Yash Thesia
26
0
0
29 Apr 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Wenjie Qu
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
54
0
0
29 Apr 2025
MemeBLIP2: A novel lightweight multimodal system to detect harmful memes
Jiaqi Liu
Ran Tong
Aowei Shen
Shuzheng Li
Changlin Yang
Lisha Xu
VLM
77
1
0
29 Apr 2025
Style-Adaptive Detection Transformer for Single-Source Domain Generalized Object Detection
Jianhong Han
Yupei Wang
Liang Chen
ViT
42
0
0
29 Apr 2025
Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting
Hanxi Liu
Yifang Men
Zhouhui Lian
3DGS
33
0
0
29 Apr 2025
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
Woongyeong Yeo
Kangsan Kim
Soyeong Jeong
Jinheon Baek
Sung Ju Hwang
54
1
0
29 Apr 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
91
0
0
29 Apr 2025
Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization
Shuai Gong
C. Cui
Xiaolin Dong
Xiushan Nie
Lei Zhu
Xiaojun Chang
FedML
MoE
64
0
0
29 Apr 2025
T2ID-CAS: Diffusion Model and Class Aware Sampling to Mitigate Class Imbalance in Neck Ultrasound Anatomical Landmark Detection
Manikanta Varaganti
Amulya Vankayalapati
Nour Awad
Gregory R. Dion
Laura J. Brattain
DiffM
MedIm
67
0
0
29 Apr 2025
GLIP-OOD: Zero-Shot Graph OOD Detection with Graph Foundation Model
Haoyan Xu
Zhengtao Yao
Xuzhi Zhang
Zihan Wang
Langzhou He
Yushun Dong
Philip S. Yu
Mengyuan Li
Yue Zhao
OODD
VLM
69
0
0
29 Apr 2025
Partitioned Memory Storage Inspired Few-Shot Class-Incremental learning
Renye Zhang
Yimin Yin
Jinghua Zhang
CLL
59
0
0
29 Apr 2025
YoChameleon: Personalized Vision and Language Generation
Thao Nguyen
Krishna Kumar Singh
Jing Shi
Trung H. Bui
Yong Jae Lee
Yuheng Li
MLLM
82
0
0
29 Apr 2025
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu
Yizhou Wang
Xiangyu Yue
Xinzhu Ma
J. Guo
Dongzhan Zhou
Wanli Ouyang
Shixiang Tang
66
0
0
29 Apr 2025
A Survey on Parameter-Efficient Fine-Tuning for Foundation Models in Federated Learning
Jieming Bian
Yuanzhe Peng
Lei Wang
Yin Huang
Jie Xu
FedML
65
0
0
29 Apr 2025
TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social Networks
S. Dell’Anna
Andrea Montibeller
Giulia Boato
62
0
0
29 Apr 2025
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Quentin Guimard
Moreno DÍncà
Massimiliano Mancini
Elisa Ricci
SSL
72
0
0
29 Apr 2025
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
57
0
0
28 Apr 2025
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning
Peijian Zeng
Feiyan Pang
Zhanbo Wang
Aimin Yang
74
0
0
28 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Sonia Joseph
Praneet Suresh
Lorenz Hufe
Edward Stevinson
Robert Graham
Yash Vadi
Danilo Bzdok
Sebastian Lapuschkin
Lee Sharkey
Blake A. Richards
72
0
0
28 Apr 2025
CLIP-KOA: Enhancing Knee Osteoarthritis Diagnosis with Multi-Modal Learning and Symmetry-Aware Loss Functions
Yejin Jeong
Donghun Lee
63
0
0
28 Apr 2025
CompleteMe: Reference-based Human Image Completion
Yu-Ju Tsai
Brian L. Price
Qing Liu
Luis Figueroa
D. Pakhomov
Zhihong Ding
Scott D. Cohen
Ming Yang
3DH
52
0
0
28 Apr 2025
Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
Muzammil Behzad
Guoying Zhao
VLM
51
0
0
28 Apr 2025
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
Jianbo Gao
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
AAML
28
0
0
28 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
99
0
0
28 Apr 2025
WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution
Pietro Bongini
S. Mandelli
Andrea Montibeller
Mirko Casu
Orazio Pontorno
...
Paolo Bestagini
Irene Amerini
F. D. De Natale
Sebastiano Battiato
Mauro Barni
VLM
83
0
0
28 Apr 2025
ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies
Shubham Gandhi
Dhruv Shah
Manasi S. Patwardhan
L. Vig
Gautam M. Shroff
LLMAG
AI4CE
158
0
0
28 Apr 2025
RepText: Rendering Visual Text via Replicating
Haozhao Wang
Yongjun Xu
Yong Li
Jiajun Li
Chaowei Zhang
Jingchao Wang
Kejia Yang
Z. Chen
VLM
66
0
0
28 Apr 2025
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
Valerie Zermatten
J. Castillo-Navarro
Pallavi Jain
D. Tuia
Diego Marcos
62
0
0
28 Apr 2025
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Chia-Yu Hung
Qi Sun
Pengfei Hong
Amir Zadeh
Chuan Li
U-Xuan Tan
Navonil Majumder
Soujanya Poria
LM&Ro
42
1
0
28 Apr 2025
GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning
Juyi Sheng
Yangjun Liu
Sheng Xu
Zhixin Yang
Mengyuan Liu
59
0
0
28 Apr 2025
Masked Language Prompting for Generative Data Augmentation in Few-shot Fashion Style Recognition
Yuki Hirakawa
Ryotaro Shimizu
41
0
0
28 Apr 2025
Exploiting Inter-Sample Correlation and Intra-Sample Redundancy for Partially Relevant Video Retrieval
Junlong Ren
Gangjian Zhang
Y. Hu
Jian Shu
Haoran Wang
29
0
0
28 Apr 2025
mrCAD: Multimodal Refinement of Computer-aided Designs
William P. McCarthy
Saujas Vaduguru
K. Willis
Justin Matejka
Judith E. Fan
Daniel Fried
Yewen Pu
41
0
0
28 Apr 2025
Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration
Juhan Park
Kyungjae Lee
Hyung Jin Chang
Jungchan Cho
VLM
66
0
0
28 Apr 2025
SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation
Yulong Guo
Zilun Zhang
Yongheng Shang
Tiancheng Zhao
Shuiguang Deng
Yingchun Yang
Jianwei Yin
68
0
0
28 Apr 2025
Image Interpolation with Score-based Riemannian Metrics of Diffusion Models
Shinnosuke Saito
Takashi Matsubara
DiffM
82
1
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
84
0
0
28 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
L. Varshney
59
0
0
27 Apr 2025
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
72
0
0
27 Apr 2025
Previous
1
2
3
...
7
8
9
...
195
196
197
Next