Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (29177★)
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 1,722 papers shown
Title
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Zhiwei Jia
Yuesong Nan
Huixi Zhao
Gengdai Liu
EGVM
186
1
0
22 Nov 2024
IterIS: Iterative Inference-Solving Alignment for LoRA Merging
Hongxu Chen
Runshi Li
Bowei Zhu
Zhen Wang
Long Chen
MoMe
156
2
0
21 Nov 2024
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models
Jordan Vice
Naveed Akhtar
Leonid Sigal
Richard Hartley
Ajmal Mian
EGVM
132
0
0
21 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yanjie Wang
Gangshan Wu
Tong He
Limin Wang
184
3
0
21 Nov 2024
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction
Jordan Vice
Naveed Akhtar
Leonid Sigal
Ajmal Mian
Ajmal Mian
DiffM
146
0
0
21 Nov 2024
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
Taha Koleilat
Hojat Asgariandehkordi
H. Rivaz
Yiming Xiao
VLM
173
1
0
21 Nov 2024
AI-generated Image Detection: Passive or Watermark?
Moyang Guo
Yuepeng Hu
Zhengyuan Jiang
Zeyu Li
Amir Sadovnik
Arka Daw
Neil Zhenqiang Gong
197
1
0
20 Nov 2024
Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation
M-U Jang
Hye Won Chung
TTA
528
0
0
20 Nov 2024
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh
Nimrod Shabtay
Wei Lin
Eli Schwartz
Hilde Kuehne
...
Leonid Karlinsky
James Glass
Assaf Arbelle
S. Ullman
Muhammad Jehanzeb Mirza
VLM
166
1
0
20 Nov 2024
LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement
Siwen Jiao
Yangyi Fang
Baoyun Peng
Wangqun Chen
Bharadwaj Veeravalli
180
5
0
20 Nov 2024
Find Any Part in 3D
Ziqi Ma
Yisong Yue
Georgia Gkioxari
3DPC
186
5
0
20 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
504
1
0
19 Nov 2024
Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
Rahul Garg
Trilok Padhi
Hemang Jain
Ugur Kursuncu
Ponnurangam Kumaraguru
146
4
0
19 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
281
1
0
18 Nov 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
115
5
0
18 Nov 2024
Conceptwm: A Diffusion Model Watermark for Concept Protection
Liangqi Lei
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
WIGM
132
2
0
18 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
454
2
0
17 Nov 2024
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer
Shitong Shao
Zikai Zhou
Tian Ye
Lichen Bai
Zhiqiang Xu
Zeke Xie
DiffM
101
0
0
16 Nov 2024
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
Yimiao Zhou
Mengcheng Lan
Xiang Li
Yiping Ke
Yiping Ke
Xue Jiang
Qingyun Li
Xue Yang
Wayne Zhang
ObjD
VLM
238
7
0
16 Nov 2024
TDSM: Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action Recognition
Jeonghyeok Do
Munchurl Kim
114
1
0
16 Nov 2024
C-DiffSET: Leveraging Latent Diffusion for SAR-to-EO Image Translation with Confidence-Guided Reliable Object Generation
Jeonghyeok Do
Jaehyup Lee
Munchurl Kim
DiffM
99
2
0
16 Nov 2024
ColorEdit: Training-free Image-Guided Color editing with diffusion model
Xingxi Yin
Zhi Li
Jingfeng Zhang
Chenglin Li
Yin Zhang
DiffM
146
0
0
15 Nov 2024
DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models
Yongdong Wang
Runze Xiao
Jun Younes Louhi Kasahara
Ryosuke Yajima
Keiji Nagatani
Atsushi Yamashita
Hajime Asama
98
7
0
13 Nov 2024
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
Moran Yanuka
Assaf Ben-Kish
Yonatan Bitton
Idan Szpektor
Raja Giryes
VLM
116
3
0
13 Nov 2024
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task
H. Haresamudram
Chi Ian Tang
Sungho Suh
P. Lukowicz
Thomas Ploetz
173
3
0
11 Nov 2024
Deep Active Learning in the Open World
Tian Xie
Jifan Zhang
Haoyue Bai
R. Nowak
VLM
412
3
0
10 Nov 2024
KMM: Key Frame Mask Mamba for Extended Motion Generation
Zeyu Zhang
Hang Gao
Akide Liu
Qi Chen
Feng Chen
...
Hao Tang
Zhenming Li
Zhongwen Zhou
Hao Tang
Bohan Zhuang
Mamba
VGen
93
3
0
10 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
72
0
0
09 Nov 2024
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
Fahad Shahbaz Khan
Salman Khan
MLLM
VGen
VLM
110
9
0
07 Nov 2024
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
Koichi Namekata
Sherwin Bahmani
Ziyi Wu
Yash Kant
Igor Gilitschenski
David B. Lindell
VGen
150
16
0
07 Nov 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
131
22
0
07 Nov 2024
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Jingwei Xu
Chenyu Wang
Zibo Zhao
Wen Liu
Yi-An Ma
Shenghua Gao
113
18
0
07 Nov 2024
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization
Huayang Huang
Yu Wu
Qian Wang
DiffM
WIGM
100
7
0
06 Nov 2024
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games Soundtracks
Felipe Marra
Lucas N. Ferreira
94
0
0
06 Nov 2024
Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection
Pengfei Lyu
Pak-Hei Yeung
Xiufei Cheng
Xiaosheng Yu
Chengdong Wu
Jagath C. Rajapakse
93
0
0
06 Nov 2024
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi
Pietro Astolfi
Melissa Hall
Reyhane Askari Hemmat
Yohann Benchetrit
...
Matthew Muckley
Karteek Alahari
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
AI4CE
VLM
128
4
0
05 Nov 2024
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Yangning Li
Hai-Tao Zheng
Xinyu Wang
Yong Jiang
Zhen Zhang
...
Hui Wang
Hai-Tao Zheng
Pengjun Xie
Philip S. Yu
Fei Huang
140
23
0
05 Nov 2024
Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models
Anjith George
S´ebastien Marcel
109
2
0
04 Nov 2024
One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering
Deepayan Das
Davide Talon
Massimiliano Mancini
Yiming Wang
Elisa Ricci
120
0
0
04 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
Sheng-Chieh Lin
Chankyu Lee
Mohammad Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Ming-Yu Liu
259
19
0
04 Nov 2024
Not Just Object, But State: Compositional Incremental Learning without Forgetting
Yanyi Zhang
Binglin Qiu
Qi Jia
Yu Liu
Ran He
CLL
98
0
0
04 Nov 2024
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
Sejoon Oh
Yiqiao Jin
Megha Sharma
Donghyun Kim
Eric Ma
Gaurav Verma
Srijan Kumar
113
7
0
03 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
211
1
0
02 Nov 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
221
3
0
01 Nov 2024
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
Shengxun Wei
Zan Gao
Yibo Zhao
Weili Guan
Weili Guan
Shengyong Chen
127
2
0
01 Nov 2024
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
Gi-Cheon Kang
Junghyun Kim
Kyuhwan Shim
Jun Ki Lee
Byoung-Tak Zhang
LM&Ro
249
2
1
01 Nov 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
98
2
0
31 Oct 2024
FRoundation: Are Foundation Models Ready for Face Recognition?
Tahar Chettaoui
Naser Damer
Fadi Boutros
CVBM
81
8
0
31 Oct 2024
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang
Maixuan Xue
Xinran Liu
Zheng Pan
Xing Wei
206
2
0
31 Oct 2024
A Geometric Framework for Understanding Memorization in Generative Models
Brendan Leigh Ross
Hamidreza Kamkari
Tongzi Wu
Rasa Hosseinzadeh
Zhaoyan Liu
George Stein
Jesse C. Cresswell
Gabriel Loaiza-Ganem
134
9
0
31 Oct 2024
Previous
1
2
3
...
18
19
20
...
33
34
35
Next