Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (29177★)
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 1,722 papers shown
Title
Superpose Singular Features for Model Merging
Haiquan Qiu
You Wu
Quanming Yao
MoMe
147
0
0
15 Feb 2025
Adaptive Neural Networks for Intelligent Data-Driven Development
Youssef Shoeb
Azarm Nowzad
Hanno Gottschalk
232
2
0
14 Feb 2025
Exploring the Needs of Practising Musicians in Co-Creative AI Through Co-Design
Stephen James Krol
Maria Teresa Llano Rodriguez
Miguel Loor Paredes
116
0
0
13 Feb 2025
One-shot Federated Learning Methods: A Practical Guide
Xiang Liu
Zhenheng Tang
Xia Li
Yijun Song
Sijie Ji
Zemin Liu
Bo Han
Linshan Jiang
Jialin Li
FedML
148
1
0
13 Feb 2025
Object-Centric Latent Action Learning
Albina Klepach
Alexander Nikulin
Ilya Zisman
Denis Tarasov
Alexander Derevyagin
Andrei Polubarov
Nikita Lyubaykin
Vladislav Kurenkov
115
0
0
13 Feb 2025
Visual Graph Question Answering with ASP and LLMs for Language Parsing
Jakob Johannes Bauer
Thomas Eiter
Nelson Higuera Ruiz
J. Oetsch
GNN
136
0
0
13 Feb 2025
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
Shiryu Ueno
Yoshikazu Hayashi
Shunsuke Nakatsuka
Yusei Yamada
Hiroaki Aizawa
K. Kato
MLLM
VLM
175
0
0
13 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
314
7
0
12 Feb 2025
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Shivansh Patel
Xinchen Yin
Wenlong Huang
Shubham Garg
H. Nayyeri
Li Fei-Fei
Svetlana Lazebnik
Yongqian Li
158
1
0
12 Feb 2025
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi
Chen Li
Tieliang Gong
Yefeng Zheng
Huazhu Fu
VLM
160
12
0
12 Feb 2025
Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions
Prajwal Gatti
Kshitij Parikh
Dhriti Prasanna Paul
Manish Gupta
Anand Mishra
198
2
0
12 Feb 2025
MatSwap: Light-aware material transfers in images
Ivan Lopes
Valentin Deschaintre
Yannick Hold-Geoffroy
Raoul de Charette
DiffM
212
0
0
11 Feb 2025
TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
Navid Rajabi
Jana Kosecka
LM&Ro
3DV
123
0
0
11 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
181
6
0
10 Feb 2025
Conformal Predictions for Human Action Recognition with Vision-Language Models
Bary Tim
Fuchs Clément
Macq Benoît
VLM
129
0
0
10 Feb 2025
Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
Vlad Hosu
Lorenzo Agnolucci
Daisuke Iso
Dietmar Saupe
116
0
0
10 Feb 2025
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models
Kamil Garifullin
Maxim Nikolaev
Andrey Kuznetsov
Aibek Alanov
108
0
0
10 Feb 2025
Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation
Emanuele Mule
Matteo Pannacci
Ali Ghasemi Goudarzi
Francesco Pro
Lorenzo Papa
Luca Maiano
Irene Amerini
129
0
0
10 Feb 2025
Learning Clustering-based Prototypes for Compositional Zero-shot Learning
Hongyu Qu
Jianan Wei
Xiangbo Shu
Wenguan Wang
VLM
145
1
0
10 Feb 2025
History-Guided Video Diffusion
Kiwhan Song
Boyuan Chen
Max Simchowitz
Yilun Du
Russ Tedrake
Vincent Sitzmann
VGen
189
18
0
10 Feb 2025
CoS: Chain-of-Shot Prompting for Long Video Understanding
Jian Hu
Zixu Cheng
Chenyang Si
Wei Li
Shaogang Gong
101
8
0
10 Feb 2025
Model Diffusion for Certifiable Few-shot Transfer Learning
Fady Rezk
Royson Lee
Henry Gouk
Timothy M. Hospedales
Minyoung Kim
124
0
0
10 Feb 2025
Unleashing the Potential of Pre-Trained Diffusion Models for Generalizable Person Re-Identification
Jiachen Li
Xiaojin Gong
DiffM
177
0
0
10 Feb 2025
Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education
Yanhao Jia
Xinyi Wu
Hao Li
Qinglin Zhang
Yuxiao Hu
Shuai Zhao
Wenqi Fan
156
5
0
09 Feb 2025
MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation
Zhiyong Yang
Keyang Lu
Chao Zhang
Jiaxing Qi
Hanqi Jiang
...
Yifan Xu
Mingzhe Xing
Zhen Xiao
Jieyi Long
Xiangde Liu
105
5
0
09 Feb 2025
Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
3DGS
3DPC
AI4CE
141
1
0
09 Feb 2025
Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector
Qirui Wu
Shizhou Zhang
De Cheng
Yinghui Xing
Di Xu
Peng Wang
Yanning Zhang
ObjD
207
0
0
08 Feb 2025
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Vivek Myers
Bill Chunyuan Zheng
Anca Dragan
Kuan Fang
Sergey Levine
174
1
0
08 Feb 2025
Demonstrating CavePI: Autonomous Exploration of Underwater Caves by Semantic Guidance
Alankrit Gupta
Adnan Abdullah
Xianyao Li
Vaishnav Ramesh
Ioannis M. Rekleitis
Md Jahidul Islam
119
1
0
07 Feb 2025
Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment
Minh-Quan Le
Gaurav Mittal
Tianjian Meng
A S M Iftekhar
Vishwas Suryanarayanan
Barun Patra
Dimitris Samaras
Mei Chen
DiffM
125
0
0
07 Feb 2025
Interpretable Failure Detection with Human-Level Concepts
Kien X. Nguyen
Tang Li
Xi Peng
115
1
0
07 Feb 2025
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Andrew D. Bagdanov
CLIP
VLM
160
5
0
06 Feb 2025
Decoder-Only LLMs are Better Controllers for Diffusion Models
Ziyi Dong
Yao Xiao
Pengxu Wei
Liang Lin
DiffM
214
0
0
06 Feb 2025
Augmented Conditioning Is Enough For Effective Training Image Generation
Jiahui Chen
Amy Zhang
Adriana Romero-Soriano
DiffM
VLM
169
0
0
06 Feb 2025
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
Mennatullah Siam
VLM
150
1
0
06 Feb 2025
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
Zhekai Du
Yinjie Min
Jingjing Li
Ke Lu
Changliang Zou
Liuhua Peng
Tingjin Chu
Mingming Gong
423
2
0
05 Feb 2025
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas
Muneeza Azmat
R. Horesh
Mikhail Yurochkin
169
1
0
05 Feb 2025
Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization
Yixiao Chen
Shikun Sun
Jianshu Li
Ruoyu Li
Zhe Li
Junliang Xing
AAML
278
0
0
04 Feb 2025
Al-Khwarizmi: Discovering Physical Laws with Foundation Models
Christopher E. Mower
Haitham Bou-Ammar
AI4CE
174
2
0
03 Feb 2025
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
Aashish Rai
Dilin Wang
Mihir Jain
N. Sarafianos
Arthur Chen
Srinath Sridhar
Aayush Prakash
3DGS
160
1
0
03 Feb 2025
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Haibo Tong
Zhaoyang Wang
Zhe Chen
Haonian Ji
Shi Qiu
...
Peng Xia
Mingyu Ding
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
VGen
197
4
0
03 Feb 2025
Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Tianlin Zhang
En Yu
Yi Shao
Shuai Li
147
0
0
03 Feb 2025
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
D. Yao
Wenpin Tang
110
1
0
03 Feb 2025
SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models
Jiawen Zhang
Kejia Chen
Zunlei Feng
Jian Lou
Mingli Song
Qingbin Liu
Xiaoyu Yang
AAML
SILM
FedML
129
1
0
02 Feb 2025
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
Can Jin
Ying Li
Mingyu Zhao
Shiyu Zhao
Zhenting Wang
Xiaoxiao He
Ligong Han
Tong Che
Dimitris N. Metaxas
VPVLM
VLM
293
2
0
02 Feb 2025
Vision-centric Token Compression in Large Language Model
Ling Xing
Alex Jinpeng Wang
Rui Yan
Xiangbo Shu
Jinhui Tang
VLM
124
0
0
02 Feb 2025
Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
Jingming Xia
Guanqun Cao
Guang Ma
Yiben Luo
Qinzhao Li
John Oyekan
MDE
101
0
0
01 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
191
1
0
01 Feb 2025
LiDAR Loop Closure Detection using Semantic Graphs with Graph Attention Networks
Liudi Yang
Ruben Mascaro
Ignacio Alzugaray
Sai Manoj Prakhya
Marco Karrer
Ziyuan Liu
M. Chli
3DPC
138
1
0
31 Jan 2025
Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields
Xingyu Miao
Haoran Duan
Yang Bai
Tejal Shah
Jun Song
Yang Long
R. Ranjan
Ling Shao
154
5
0
31 Jan 2025
Previous
1
2
3
...
11
12
13
...
33
34
35
Next