Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (29177★)
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 1,722 papers shown
Title
A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches
Luca Ciampi
Ali Azmoudeh
Elif Ecem Akbaba
Erdi Sarıtaş
Ziya Ata Yazıcı
H. K. Ekenel
Giuseppe Amato
Fabrizio Falchi
172
0
0
31 Jan 2025
Topological Signatures of Adversaries in Multimodal Alignments
Minh Vu
Geigh Zollicoffer
Huy Mai
B. Nebgen
Boian S. Alexandrov
Manish Bhattarai
AAML
106
1
0
29 Jan 2025
Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models
Behraj Khan
T. Syed
474
1
0
29 Jan 2025
Boosting Weak Positives for Text Based Person Search
Akshay Modi
Ashhar Aziz
Nilanjana Chatterjee
A V Subramanyam
101
0
0
29 Jan 2025
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Bartosz Cywiński
Kamil Deja
DiffM
122
9
0
29 Jan 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
125
2
0
28 Jan 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
192
0
0
28 Jan 2025
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
Chenguo Lin
Panwang Pan
Bangbang Yang
Zeming Li
Yadong Mu
3DGS
164
9
0
28 Jan 2025
Can Pose Transfer Models Generate Realistic Human Motion?
Vaclav Knapp
Matyas Bohacek
422
1
0
28 Jan 2025
MADation: Face Morphing Attack Detection with Foundation Models
Eduarda Caldeira
Guray Ozgur
Tahar Chettaoui
Marija Ivanovska
Peter Peer
Fadi Boutros
Vitomir Štruc
Naser Damer
CVBM
90
2
1
28 Jan 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Min Zhang
LM&MA
AILaw
223
175
0
28 Jan 2025
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts
Wenju Sun
Qingyong Li
Wen Wang
Yangli-ao Geng
Boyang Li
185
5
0
28 Jan 2025
Visual Generation Without Guidance
Huayu Chen
Kai Jiang
Kaiwen Zheng
Jianfei Chen
Hang Su
Jun Zhu
154
2
0
28 Jan 2025
BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity
Zahra Gharaee
Scott C. Lowe
ZeMing Gong
Pablo Millán Arias
Nicholas Pellegrino
...
Lila Kari
Dirk Steinke
Graham W. Taylor
Paul Fieguth
Angel X. Chang
109
11
0
28 Jan 2025
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control
Aosong Feng
Weikang Qiu
Jinbin Bai
Xiao Zhang
Zhen Dong
Kaicheng Zhou
Rex Ying
Leandros Tassiulas
DiffM
106
6
0
28 Jan 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
231
127
0
28 Jan 2025
MATCHA:Towards Matching Anything
Fei Xue
Sven Elflein
Laura Leal-Taixe
Qunjie Zhou
127
1
0
28 Jan 2025
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin
Mingbao Lin
Luxi Lin
Rongrong Ji
97
24
0
28 Jan 2025
Transformer-Based Multimodal Knowledge Graph Completion with Link-Aware Contexts
Haodi Ma
Dzmitry Kasinets
Daisy Zhe Wang
111
0
0
28 Jan 2025
Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds
Xiaoyu Xiang
Liat Sless Gorelik
Yuchen Fan
Omri Armstrong
Forrest N. Iandola
Yilei Li
Ita Lifshitz
Rakesh Ranjan
3DGS
DiffM
173
5
0
28 Jan 2025
MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field
Zijian Győző Yang
Zhongwei Qiu
Chang Xu
Dongmei Fu
128
2
0
28 Jan 2025
Large Language Model Distilling Medication Recommendation Model
Qidong Liu
Xian Wu
Xiangyu Zhao
Yuanshao Zhu
Zijian Zhang
Feng Tian
Yefeng Zheng
LM&MA
125
20
0
28 Jan 2025
sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
Jingyuan Chen
Yuan Yao
Mie Anderson
Natalie Hauglund
Celia Kjaerby
Verena Untiet
Maiken Nedergaard
Jiebo Luo
140
2
0
28 Jan 2025
Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition
Jielong Tang
Zhenxing Wang
Ziyang Gong
Jianxing Yu
Shuang Wang
Jian Yin
129
0
0
28 Jan 2025
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable
Shreyash Arya
Sukrut Rao
Moritz Bohle
Bernt Schiele
167
3
0
28 Jan 2025
Multi-Modality Transformer for E-Commerce: Inferring User Purchase Intention to Bridge the Query-Product Gap
Srivatsa Mallapragada
Ying Xie
Varsha Rani Chawan
Zeyad Hailat
Yuanbo Wang
80
0
0
28 Jan 2025
Turn That Frown Upside Down: FaceID Customization via Cross-Training Data
Shuhe Wang
Xiaoya Li
Xiaofei Sun
G. Wang
Tianwei Zhang
Jiwei Li
Eduard H. Hovy
103
1
0
28 Jan 2025
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
Jeremy Irvin
Emily Ruoyu Liu
Joyce Chuyi Chen
Ines Dormoy
Jinyoung Kim
Samar Khanna
Zhuo Zheng
Stefano Ermon
MLLM
VLM
160
12
0
28 Jan 2025
TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning
Miaoge Li
Jingcai Guo
Richard Yi Da Xu
Dongsheng Wang
Xiaofeng Cao
Zhijie Rao
Song Guo
CoGe
151
3
0
28 Jan 2025
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
Mai A. Shaaban
Adnan Khan
Mohammad Yaqub
LM&MA
120
2
0
28 Jan 2025
Rethinking the Bias of Foundation Model under Long-tailed Distribution
Jiahao Chen
Bin Qin
Jiangmeng Li
Hao Chen
Fuchun Sun
154
0
0
27 Jan 2025
Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge
Anh-Kiet Duong
Petra Gomez-Krämer
107
2
0
27 Jan 2025
Scaling laws for decoding images from brain activity
Hubert J. Banville
Yohann Benchetrit
Stéphane DÁscoli
Jérémy Rapin
J. King
MedIm
111
0
0
25 Jan 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
149
0
0
25 Jan 2025
CVOCSemRPL: Class-Variance Optimized Clustering, Semantic Information Injection and Restricted Pseudo Labeling based Improved Semi-Supervised Few-Shot Learning
Rhythm Baghel
Souvik Maji
Pratik Mazumder
110
0
0
24 Jan 2025
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Runyi Hu
Jing Zhang
You Li
Jiwei Li
Qing Guo
Han Qiu
Tianwei Zhang
WIGM
VGen
167
8
0
24 Jan 2025
Towards Scalable Topological Regularizers
Hiu-Tung Wong
Darrick Lee
Hong Yan
BDL
109
0
0
24 Jan 2025
PAID: A Framework of Product-Centric Advertising Image Design
Hongyu Chen
Min Zhou
Jing Jiang
Jiale Chen
Yang Lu
Bo Xiao
T. Ge
Bo Zheng
DiffM
VLM
91
0
0
24 Jan 2025
LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps
Andrey Palaev
Adil Mehmood Khan
S. M. Ahsan Kazmi
DiffM
113
0
0
23 Jan 2025
Improving Video Generation with Human Feedback
Jie Liu
Gongye Liu
Jiajun Liang
Ziyang Yuan
Xiaokun Liu
...
Pengfei Wan
Di Zhang
Kun Gai
Yujiu Yang
Wanli Ouyang
VGen
EGVM
145
26
0
23 Jan 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
202
12
0
23 Jan 2025
MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance
Wooseok Song
Seunggyu Chang
Jaejun Yoo
DiffM
93
0
0
23 Jan 2025
On Storage Neural Network Augmented Approximate Nearest Neighbor Search
Taiga Ikeda
Daisuke Miyashita
J. Deguchi
78
0
0
23 Jan 2025
CGI: Identifying Conditional Generative Models with Example Images
Zhi Zhou
Hao-Zhe Tan
Peng-Xiao Song
Lan-Zhe Guo
DiffM
78
0
0
23 Jan 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
101
1
0
22 Jan 2025
Slot-BERT: Self-supervised Object Discovery in Surgical Video
Guiqiu Liao
M. Jogan
Marcel Hussing
Kenta Nakahashi
Kazuhiro Yasufuku
Amin Madani
Eric Eaton
Daniel A. Hashimoto
446
0
0
21 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Wentao Zhang
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
207
25
0
21 Jan 2025
Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models
Tabinda Aman
Mohammad Nadeem
S. Sohail
Mohammad Anas
Min Zhang
VLM
151
1
0
21 Jan 2025
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Zibo Zhao
Zeqiang Lai
Qingxiang Lin
Yunfei Zhao
Haolin Liu
...
Jingwei Huang
Chunchao Guo
Jie Jiang
Jingwei Huang
Chunchao Guo
255
45
0
21 Jan 2025
See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization
Zongqi He
Zhe Xiao
Kin-Chung Chan
Yushen Zuo
Jun Xiao
Kin-Man Lam
3DGS
153
0
0
20 Jan 2025
Previous
1
2
3
...
12
13
14
...
33
34
35
Next