Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 9,975 papers shown
Title
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
72
0
0
27 Apr 2025
HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease
Qiuhui Chen
Jintao Wang
Gang Wang
Yi Hong
52
0
0
27 Apr 2025
Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation
Lei Zhong
Chuan Guo
Yiming Xie
Jiawei Wang
Changjian Li
VGen
52
0
0
27 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
L. Varshney
59
0
0
27 Apr 2025
Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID
De-Chun Cheng
Lingfeng He
N. Wang
Dingwen Zhang
X. Gao
34
0
0
27 Apr 2025
Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting
Xiaofeng Jin
Yan Fang
Matteo Frosi
Jianfei Ge
Jiangjian Xiao
Matteo Matteucci
3DGS
65
0
0
27 Apr 2025
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
Xiaofeng Jin
Matteo Frosi
Matteo Matteucci
181
0
0
27 Apr 2025
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki
Anabia Sohail
I. I. Ganapathi
B. Alawode
Asim Khan
Sajid Javed
Naoufel Werghi
Mohammed Bennamoun
Arif Mahmood
66
0
0
26 Apr 2025
ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding
Santosh Rajagopalan
Jonathan Vronsky
Songbai Yan
S. Alireza Golestaneh
Shubhra Chandra
Min Zhou
66
0
0
26 Apr 2025
Video CLIP Model for Multi-View Echocardiography Interpretation
Ryo Takizawa
Satoshi Kodera
Tempei Kabayama
Ryo Matsuoka
Yuta Ando
Yuto Nakamura
Haruki Settai
Norihiko Takeda
42
0
0
26 Apr 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Xiaozhong Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Heng Chang
Jun Zhu
MLLM
178
0
0
25 Apr 2025
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
Jiahao Zhang
Bowen Wang
Hong Liu
Liangzhi Li
Yuta Nakashima
Hajime Nagahara
VLM
104
0
0
25 Apr 2025
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang
Wenliang Zheng
Aashrith Madasu
Peng Shi
Ryo Kamoi
...
Ranran Haoran Zhang
Avitej Iyer
Renze Lou
Wenpeng Yin
Rui Zhang
68
0
0
25 Apr 2025
TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation
Shintaro Ozaki
Kazuki Hayashi
Yusuke Sakai
Jingun Kwon
Hidetaka Kamigaito
Katsuhiko Hayashi
Manabu Okumura
Taro Watanabe
VLM
88
0
0
25 Apr 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
106
0
0
25 Apr 2025
Combating the Bucket Effect:Multi-Knowledge Alignment for Medication Recommendation
Xiang Li
Haixu Ma
Guanyong Wu
Shi Mu
Chong Li
Shunpan Liang
41
0
0
25 Apr 2025
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Minjae Kang
Martim Brandão
64
0
0
25 Apr 2025
STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting
Yunze Deng
Haijun Xiong
Bin Feng
Xinyu Wang
Wei Liu
3DGS
47
0
0
25 Apr 2025
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
45
0
0
25 Apr 2025
ShapeSpeak: Body Shape-Aware Textual Alignment for Visible-Infrared Person Re-Identification
Shuanglin Yan
Neng Dong
Shuang Li
Rui Yan
Hao Tang
Jing Qin
160
0
0
25 Apr 2025
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
Elena Plekhanova
Damien Robert
Johannes Dollinger
Emilia Arens
Philipp Brun
Jan Dirk Wegner
Niklaus Zimmermann
24
0
0
25 Apr 2025
Generalization Capability for Imitation Learning
Yixiao Wang
160
0
0
25 Apr 2025
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Shivam Duggal
Yushi Hu
Oscar Michel
Aniruddha Kembhavi
William T. Freeman
Noah A. Smith
Ranjay Krishna
Antonio Torralba
Ali Farhadi
Wei-Chiu Ma
EGVM
ELM
80
0
0
25 Apr 2025
Semantic-Aware Contrastive Fine-Tuning: Boosting Multimodal Malware Classification with Discriminative Embeddings
Ivan Montoya Sanchez
Shaswata Mitra
Aritran Piplai
Sudip Mittal
49
0
0
25 Apr 2025
Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation
Zhuang Yu
Shiliang Sun
Jing Zhao
Tengfei Song
Hao Yang
48
0
0
25 Apr 2025
POET: Prompt Offset Tuning for Continual Human Action Adaptation
Prachi Garg
Joseph K J
V. Balasubramanian
Necati Cihan Camgöz
Chengde Wan
Kenrick Kin
Weiguang Si
Shugao Ma
Fernando de la Torre
69
0
0
25 Apr 2025
Multimodal graph representation learning for website generation based on visual sketch
Tung D. Vu
Chung Hoang
Truong-Son Hy
3DV
56
0
0
25 Apr 2025
From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval
Yabing Wang
Zhuotao Tian
Qingpei Guo
Zheng Qin
Sanping Zhou
Ming Yang
Le Wang
141
0
0
25 Apr 2025
What is the Added Value of UDA in the VFM Era?
B. B. Englert
Tommie Kerssies
Gijs Dubbelman
46
0
0
25 Apr 2025
Token Sequence Compression for Efficient Multimodal Computing
Yasmine Omri
Parth Shroff
Thierry Tambe
58
0
0
24 Apr 2025
Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
M. Zarlenga
Gabriele Dominici
Pietro Barbiero
Z. Shams
M. Jamnik
KELM
191
0
0
24 Apr 2025
Adaptive Orchestration of Modular Generative Information Access Systems
Mohanna Hoveyda
Harrie Oosterhuis
A. D. Vries
Maarten de Rijke
Faegheh Hasibi
40
0
0
24 Apr 2025
CLIPSE -- a minimalistic CLIP-based image search engine for research
Steve Göring
CLIP
VLM
34
0
0
24 Apr 2025
Class-Conditional Distribution Balancing for Group Robust Classification
Miaoyun Zhao
Qiang Zhang
C. Li
70
1
0
24 Apr 2025
Symbolic Representation for Any-to-Any Generative Tasks
Jianfei Chen
Xiaoye Zhu
Yanjie Wang
Tianyang Liu
Xinhui Chen
...
Yifei Ke
Jiaheng Liu
Yiwen Yuan
Julian McAuley
Li Li
DiffM
40
0
0
24 Apr 2025
FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model
Kaicheng Pang
Xingxing Zou
W. Wong
29
0
0
24 Apr 2025
Text-to-Image Alignment in Denoising-Based Models through Step Selection
P. Grimal
Hervé Le Borgne
Olivier Ferret
DiffM
EGVM
48
0
0
24 Apr 2025
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation
Ling You
Wenxuan Huang
Xinni Xie
Xiangyi Wei
Bangyan Li
Shaohui Lin
Yang Li
Changbo Wang
VGen
181
1
0
24 Apr 2025
CIVIL: Causal and Intuitive Visual Imitation Learning
Yinlong Dai
Robert Ramirez Sanchez
Ryan Jeronimus
Shahabedin Sagheb
Cara M. Nunez
Heramb Nemlekar
Dylan P. Losey
74
1
0
24 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
73
1
0
24 Apr 2025
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
Kevin Lane
Morteza Karimzadeh
41
0
0
24 Apr 2025
Dual Prompting Image Restoration with Diffusion Transformers
Dehong Kong
Fan Li
Zhixin Wang
Jiaqi Xu
Renjing Pei
W. J. Li
Wenqi Ren
DiffM
69
0
0
24 Apr 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Yonatan Bitton
Brian Gordon
Michal Sokolik
...
Almog Gueta
Royi Rassin
Itay Laish
Dani Lischinski
Idan Szpektor
EGVM
VGen
43
0
0
24 Apr 2025
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding
Hyomin Lee
Minseon Kim
Sangwon Jang
Jongheon Jeong
Sung Ju Hwang
DiffM
AAML
39
1
0
24 Apr 2025
Step1X-Edit: A Practical Framework for General Image Editing
Shixuan Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
Xuzhi Zhang
Gang Yu
Daxin Jiang
DiffM
111
4
0
24 Apr 2025
A Simple DropConnect Approach to Transfer-based Targeted Attack
Tongrui Su
Qingbin Li
Shengyu Zhu
Wei Chen
Xueqi Cheng
AAML
69
0
0
24 Apr 2025
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Lynn Cherif
Flemming Kondrup
David Venuto
Ankit Anand
Doina Precup
Khimya Khetarpal
LM&Ro
54
0
0
24 Apr 2025
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Lakshita Agarwal
Bindu Verma
ViT
29
0
0
23 Apr 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
Wenxuan Li
Hang Zhao
Zhiyuan Yu
Yu Du
Qin Zou
Ruizhen Hu
K. Xu
SSL
83
1
0
23 Apr 2025
4D Multimodal Co-attention Fusion Network with Latent Contrastive Alignment for Alzheimer's Diagnosis
Yuxiang Wei
Wenjie Qu
Xi Xiao
Tianyang Wang
Xuben Wang
Vince D. Calhoun
152
0
0
23 Apr 2025
Previous
1
2
3
...
8
9
10
...
198
199
200
Next