Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 9,748 papers shown
Title
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng
Haoyu Zhang
Meng Liu
Weili Guan
Liqiang Nie
41
0
0
07 May 2025
Interpretable Zero-shot Learning with Infinite Class Concepts
Zihan Ye
Shreyank N Gowda
Shiming Chen
Yaochu Jin
Kaizhu Huang
Xiaobo Jin
VLM
37
0
0
06 May 2025
Task Reconstruction and Extrapolation for
π
0
π_0
π
0
using Text Latent
Quanyi Li
40
0
0
06 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
68
0
0
06 May 2025
Panoramic Out-of-Distribution Segmentation
Mengfei Duan
Kailun Yang
Y. Zhang
Yihong Cao
Fei Teng
Kai Luo
Jiaming Zhang
Zhiyong Li
Shutao Li
59
0
0
06 May 2025
ChannelExplorer: Exploring Class Separability Through Activation Channel Visualization
Md Rahat-uz- Zaman
Bei Wang
Paul Rosen
21
0
0
06 May 2025
VISLIX: An XAI Framework for Validating Vision Models with Slice Discovery and Analysis
Xinyuan Yan
Xiwei Xuan
Jorge Henrique Piazentin Ono
Jiajing Guo
V. Mohanty
Shekar Arvind Kumar
Liang Gou
Bei Wang
Liu Ren
42
0
0
06 May 2025
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
Sameer Malik
Moyuru Yamada
Ayush Singh
Dishank Aggarwal
150
0
0
06 May 2025
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
L. Wang
Senmao Li
Fei Yang
Jianye Wang
Ziheng Zhang
Yong-Jin Liu
Y. Wang
Jian Yang
DiffM
61
0
0
06 May 2025
Robust Fairness Vision-Language Learning for Medical Image Analysis
Sparsh Bansal
Mingyang Wu
Xin Wang
S. Hu
VLM
50
0
0
06 May 2025
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
Rui Lan
Y. Bai
Xu Duan
M. Li
Lei Sun
X. Chu
DiffM
140
0
0
06 May 2025
1
s
t
^{st}
s
t
Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge
Junwei Xu
Zehao Zhao
Xiaoyu Hu
Zhenjie Song
35
0
0
06 May 2025
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
Liam Boyle
Nicolas Baumann
Paviththiren Sivasothilingam
Michele Magno
Luca Benini
LM&Ro
LRM
51
0
0
06 May 2025
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
Shiyi Zhang
Junhao Zhuang
Zhaoyang Zhang
Ying Shan
Yansong Tang
VGen
107
0
0
06 May 2025
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon
Federico Girella
Ziyue Liu
Marco Cristani
Yiming Wang
VLM
52
0
0
06 May 2025
PiCo: Enhancing Text-Image Alignment with Improved Noise Selection and Precise Mask Control in Diffusion Models
Chang Xie
Chenyi Zhuang
Pan Gao
VLM
37
0
0
06 May 2025
DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor
Wei-Ting Chen
Yu-Jiet Vong
Yi-Tsung Lee
Sy-Yen Kuo
Qiang Gao
Sizhuo Ma
Jian Wang
163
0
0
06 May 2025
Artificial Behavior Intelligence: Technology, Challenges, and Future Directions
Kanghyun Jo
Jehwan Choi
Kwanho Kim
Seongmin Kim
Duy-Linh Nguyen
Xuan-Thuy Vo
Adri Priadana
Tien-Dat Tran
AI4CE
48
0
0
06 May 2025
Distribution-Conditional Generation: From Class Distribution to Creative Generation
Fu Feng
Yucheng Xie
Xu Yang
Jing Wang
Xin Geng
DiffM
31
0
0
06 May 2025
Safer Prompts: Reducing IP Risk in Visual Generative AI
Lena Reissinger
Yuanyuan Li
Anna-Carolina Haensch
Neeraj Sarna
33
0
0
06 May 2025
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
François Role
Sébastien Meyer
Victor Amblard
VLM
50
0
0
06 May 2025
Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision
Linhan Cao
Wei Sun
Kaiwei Zhang
Yicong Peng
Guangtao Zhai
Xiongkuo Min
52
0
0
06 May 2025
Mitigating Image Captioning Hallucinations in Vision-Language Models
Fei Zhao
Chengcui Zhang
Runlin Zhang
Tianyang Wang
Xi Li
VLM
44
0
0
06 May 2025
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
Haoyu Bai
Jie Wang
Gaomin Li
Xianrui Li
Xiaohu Zhang
Xia Yang
41
0
0
06 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
42
0
0
06 May 2025
EOPose : Exemplar-based object reposing using Generalized Pose Correspondences
Sarthak Mehrotra
Rishabh Jain
Mayur Hemani
Balaji Krishnamurthy
Mausoom Sarkar
51
0
0
06 May 2025
ALMA: Aggregated Lipschitz Maximization Attack on Auto-encoders
Chethan Krishnamurthy Ramanaik
Arjun Roy
Eirini Ntoutsi
AAML
32
0
0
06 May 2025
Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges
Hao Xu
Arbind Agrahari Baniya
Sam Well
Mohamed Reda Bouadjenek
Richard Dazeley
S. Aryal
AI4TS
29
0
0
06 May 2025
Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant
Haonan Wang
Jiaji Mao
Lehan Wang
Qixiang Zhang
Marawan Elbatel
...
Weifeng Qin
Hao Li
Jialin Liang
Jun Shen
Xiaomeng Li
MedIm
38
0
0
06 May 2025
A Vision-Language Model for Focal Liver Lesion Classification
Song Jian
Hu Yuchang
Wang Hui
Chen Yen-Wei
VLM
MedIm
46
0
0
06 May 2025
Enhancing Target-unspecific Tasks through a Features Matrix
Fangming Cui
Yonggang Zhang
Xuan Wang
Xinmei Tian
Jun Yu
AAML
50
0
0
06 May 2025
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li
Xiaolu Hou
Ziyang Liu
Dingkang Yang
Ziyun Qian
Jiawei Chen
Jinjie Wei
Y. Jiang
Qingyao Xu
Li Zhang
DiffM
156
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Yankai Jiang
Peng Zhang
D. Yang
Yuan Tian
Hai Lin
Xinyu Wang
MedIm
133
0
0
05 May 2025
Finger Pose Estimation for Under-screen Fingerprint Sensor
Xiongjun Guan
Zhiyu Pan
Jianjiang Feng
Jie Zhou
62
1
0
05 May 2025
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
140
0
0
05 May 2025
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing
Zinan Guo
Pengze Zhang
Yanze Wu
Chong Mou
Mingcong Liu
Qian He
33
0
0
05 May 2025
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
Utsav Nareti
S. Chattopadhyay
Prolay Mallick
Suraj Kumar
Ayush Vikas Daga
Chandranath Adak
Adarsh Wase
Arjab Roy
23
0
0
05 May 2025
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Lu Ling
C. Lin
Nayeon Lee
Yin Cui
Y. Zeng
Yichen Sheng
Yunhao Ge
Ming-Yu Liu
Aniket Bera
Zhaoshuo Li
VGen
3DV
56
0
0
05 May 2025
Structure Causal Models and LLMs Integration in Medical Visual Question Answering
Zibo Xu
Qiang Li
Weizhi Nie
Weijie Wang
Anan Liu
CML
MedIm
47
0
0
05 May 2025
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Simon Ging
Sebastian Walter
Jelena Bratulić
Johannes Dienert
Hannah Bast
Thomas Brox
CLIP
27
0
0
05 May 2025
Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models
Sassan Mokhtar
Arian Mousakhan
Silvio Galesso
Jawad Tayyub
Thomas Brox
29
0
0
05 May 2025
GIF: Generative Inspiration for Face Recognition at Scale
Saeed Ebrahimi
Sahar Rahimi
Ali Dabouei
Srinjoy Das
Jeremy M. Dawson
Nasser M. Nasrabadi
CVBM
150
0
0
05 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
51
0
0
05 May 2025
Adversarial Robustness Analysis of Vision-Language Models in Medical Image Segmentation
Anjila Budathoki
Manish Dhakal
AAML
39
0
0
05 May 2025
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
SungHeon Jeong
Jihong Park
Mohsen Imani
59
0
0
05 May 2025
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models
Kuofeng Gao
Yufei Zhu
Yiming Li
Jiawang Bai
Yong-Liang Yang
Z. Li
Shu-Tao Xia
41
0
0
05 May 2025
Recent Advances in Out-of-Distribution Detection with CLIP-Like Models: A Survey
Chaohua Li
Enhao Zhang
Chuanxing Geng
Songcan Chen
OODD
VLM
40
0
0
05 May 2025
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
Bojin Wu
Jing Chen
MDE
46
0
0
05 May 2025
VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection
Hao Cheng
Zhiwei Zhao
Yichao He
Zhenzhen Hu
Jia Li
Hao Wu
Richang Hong
43
0
0
05 May 2025
Previous
1
2
3
4
5
6
...
193
194
195
Next