ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Guotao Liang
Baoquan Zhang
Zhiyuan Wen
Junteng Zhao
Yunming Ye
Kola Ye
Yao He
92
0
0
03 Mar 2025
Zero-Shot Head Swapping in Real-World Scenarios
Zero-Shot Head Swapping in Real-World Scenarios
S. Jeong
Taewoong Kang
Hyojin Jang
Jaegul Choo
82
0
0
02 Mar 2025
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Ziyang Zhang
Yang Yu
Yucheng Chen
Xulei Yang
S. Yeo
MedIm
140
2
0
02 Mar 2025
Cyber for AI at SemEval-2025 Task 4: Forgotten but Not Lost: The Balancing Act of Selective Unlearning in Large Language Models
Dinesh Srivasthav P
Bala Mallikarjunarao Garlapati
MU
61
0
0
02 Mar 2025
Solving Instance Detection from an Open-World Perspective
Solving Instance Detection from an Open-World Perspective
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
108
1
0
01 Mar 2025
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
Tianyi Wang
Jianan Fan
Dingxin Zhang
Dongnan Liu
Yong-quan Xia
Heng Huang
Weidong Cai
134
0
0
01 Mar 2025
Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
Ojonugwa Oluwafemi Ejiga Peter
Md Mahmudur Rahman
Fahmi Khalifa
DiffMMedIm
76
1
0
28 Feb 2025
A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images
A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images
Zineb Sordo
Eric Chagnon
Daniela Ushizima
EGVMMedIm
147
1
0
28 Feb 2025
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian
Zhongliang Guo
Bowen Deng
Chun Tong Lei
Shuai Zhao
Chun Pong Lau
Xiaopeng Hong
Michael P. Pound
DiffM
160
1
0
28 Feb 2025
When Unsupervised Domain Adaptation meets One-class Anomaly Detection: Addressing the Two-fold Unsupervised Curse by Leveraging Anomaly Scarcity
When Unsupervised Domain Adaptation meets One-class Anomaly Detection: Addressing the Two-fold Unsupervised Curse by Leveraging Anomaly Scarcity
Nesryne Mejri
Enjie Ghorbel
Anis Kacem
Pavel Chernakov
Niki Maria Foteinopoulou
Djamila Aouada
106
0
0
28 Feb 2025
Unified Video Action Model
Unified Video Action Model
Shuang Li
Yihuai Gao
Dorsa Sadigh
Shuran Song
VGen
132
8
0
28 Feb 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Peijie Wang
Zhong-Zhi Li
Fei Yin
Xin Yang
Dekang Ran
Cheng-Lin Liu
LRM
116
11
0
28 Feb 2025
UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation
UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation
Thanet Markchom
Tong Wu
Liting Huang
Huizhi Liang
135
1
0
28 Feb 2025
Towards High-performance Spiking Transformers from ANN to SNN Conversion
Towards High-performance Spiking Transformers from ANN to SNN Conversion
Zihan Huang
Xinyu Shi
Zecheng Hao
Tong Bu
Jianhao Ding
Zhaofei Yu
Tiejun Huang
184
7
0
28 Feb 2025
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
Aayush Dhakal
Srikumar Sastry
Subash Khanal
Adeel Ahmad
Eric Xing
Nathan Jacobs
131
0
0
27 Feb 2025
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Itay Benou
Tammy Riklin-Raviv
131
1
0
27 Feb 2025
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
Rui Hu
Delai Qiu
Shuyu Wei
J.N. Zhang
Yining Wang
Shengping Liu
Jitao Sang
AuLLMVLM
104
0
0
27 Feb 2025
Knowledge Bridger: Towards Training-free Missing Modality Completion
Knowledge Bridger: Towards Training-free Missing Modality Completion
Guanzhou Ke
Shengfeng He
Xinyu Wang
Bo Wang
Guoqing Chao
Yize Zhang
Yi Xie
HeXing Su
173
1
0
27 Feb 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders
Interpreting CLIP with Hierarchical Sparse Autoencoders
Vladimir Zaigrajew
Hubert Baniecki
P. Biecek
254
1
0
27 Feb 2025
Your contrastive learning problem is secretly a distribution alignment problem
Your contrastive learning problem is secretly a distribution alignment problem
Zihao Chen
Chi-Heng Lin
Ran Liu
Jingyun Xiao
Eva L. Dyer
126
1
0
27 Feb 2025
DGFM: Full Body Dance Generation Driven by Music Foundation Models
DGFM: Full Body Dance Generation Driven by Music Foundation Models
Xinran Liu
Zhenhua Feng
Diptesh Kanojia
Wenwu Wang
DiffM
138
1
0
27 Feb 2025
EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training
EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training
Qingyao Tian
Huai Liao
Xinyan Huang
Bingyu Yang
Dongdong Lei
Sebastien Ourselin
Hongbin Liu
Mamba
125
2
0
26 Feb 2025
Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models
Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models
Cao Yuxuan
Wu Jiayang
Alistair Cheong Liang Chuen
Bryan Shan Guanrong
Theodore Lee Chong Jen
Sherman Chann Zhi Shen
258
0
0
25 Feb 2025
Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation
Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation
Tianyang Xu
Jiyong Rao
Xiaoning Song
Zhenhua Feng
Xiao Wu
ViT
171
1
0
25 Feb 2025
SYNTHIA: Novel Concept Design with Affordance Composition
SYNTHIA: Novel Concept Design with Affordance Composition
Xiaomeng Jin
Xiaomeng Jin
Jeonghwan Kim
Qingbin Liu
Zhenhailong Wang
Khanh Duy Nguyen
Ansel Blume
Nanyun Peng
Kai-Wei Chang
Heng Ji
DiffM
490
2
0
25 Feb 2025
Progressive Local Alignment for Medical Multimodal Pre-training
Progressive Local Alignment for Medical Multimodal Pre-training
Huimin Yan
Xian Yang
Liang Bai
Jiye Liang
102
0
0
25 Feb 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
165
2
0
25 Feb 2025
Examining the Threat Landscape: Foundation Models and Model Stealing
Examining the Threat Landscape: Foundation Models and Model Stealing
Ankita Raj
Deepankar Varma
Chetan Arora
AAML
274
1
0
25 Feb 2025
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
Yuheng Xue
Nenglun Chen
Jun Liu
Wenyun Sun
3DPC
222
7
0
24 Feb 2025
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
201
4
0
24 Feb 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin
Zehao Xiao
Pan Zhou
Shujian Yu
Jiayi Shen
Jan-Jakob Sonke
E. Gavves
150
1
0
24 Feb 2025
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
Jiahao Nick Li
Zhuohao Jerry Zhang
Zhang
176
2
0
24 Feb 2025
LaRE$^2$: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
LaRE2^22: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
Yunpeng Luo
Junlong Du
Ke Yan
Shouhong Ding
DiffM
197
24
0
24 Feb 2025
A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis
A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis
Yuli Wu
Fucheng Liu
Rüveyda Yilmaz
Henning Konermann
Peter Walter
Johannes Stegmaier
EGVMMedIm
116
2
0
24 Feb 2025
Towards Foundation Models for Mixed Integer Linear Programming
Towards Foundation Models for Mixed Integer Linear Programming
Sirui Li
Janardhan Kulkarni
Ishai Menache
Cathy Wu
Beibin Li
109
10
0
24 Feb 2025
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
LRM
96
16
0
24 Feb 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan
Zhenyi Lu
Sichen Liu
Xiaoye Qu
Xiaoye Qu
Wei Wei
Yu Cheng
MoE
515
1
0
24 Feb 2025
From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning
From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning
Pusen Dong
Tianchen Zhu
Yue Qiu
Haoyi Zhou
Jianxin Li
146
1
0
24 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
235
8
0
24 Feb 2025
R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge
R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge
Aladin Djuhera
Vlad-Costin Andrei
Mohsen Pourghasemian
Haris Gacanin
Holger Boche
Walid Saad
MoMe
181
0
0
24 Feb 2025
On the Vulnerability of Concept Erasure in Diffusion Models
On the Vulnerability of Concept Erasure in Diffusion Models
Lucas Beerens
Alex D. Richardson
Peng Sun
Dongdong Chen
DiffM
155
2
0
24 Feb 2025
HumanGif: Single-View Human Diffusion with Generative Prior
HumanGif: Single-View Human Diffusion with Generative Prior
Shoukang Hu
Takuya Narihira
Kazumi Fukuda
Ryosuke Sawata
Takashi Shibuya
Yuki Mitsufuji
177
2
0
24 Feb 2025
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
161
4
0
24 Feb 2025
CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation
CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation
Vishal Thengane
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
L. Yin
Xiatian Zhu
Salman Khan
CLL
456
0
0
24 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Graph Perceiver IO: A General Architecture for Graph Structured Data
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
244
3
0
24 Feb 2025
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Sicheng Xie
Haidong Cao
Zejia Weng
Zhen Xing
Shiwei Shen
Jiaqi Leng
Xipeng Qiu
Yanwei Fu
Zuxuan Wu
Yu Jiang
137
0
0
23 Feb 2025
Category-Selective Neurons in Deep Networks: Comparing Purely Visual and Visual-Language Models
Category-Selective Neurons in Deep Networks: Comparing Purely Visual and Visual-Language Models
Zitong Lu
Yuxin Wang
76
0
0
23 Feb 2025
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Yubo Wang
Jianting Tang
Chaohu Liu
Linli Xu
AAML
180
1
0
23 Feb 2025
Audio Visual Segmentation Through Text Embeddings
Audio Visual Segmentation Through Text Embeddings
Kyungbok Lee
You Zhang
Z. Duan
116
0
0
22 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
511
0
0
21 Feb 2025
Previous
123...91011...333435
Next