ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 9,750 papers shown
Title
Wasserstein Distances Made Explainable: Insights into Dataset Shifts and Transport Phenomena
Wasserstein Distances Made Explainable: Insights into Dataset Shifts and Transport Phenomena
Philip Naumann
Jacob R. Kauffmann
G. Montavon
29
0
0
09 May 2025
Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects
Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects
Tobias Preintner
Weixuan Yuan
Qi Huang
Adrian König
Thomas Bäck
E. Raponi
Niki van Stein
29
0
0
09 May 2025
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
Henry Zheng
Hao Shi
Qihang Peng
Yong Xien Chng
Rui Huang
Yepeng Weng
Zhongchao Shi
Gao Huang
74
1
0
08 May 2025
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
Biao Yi
Xavier Hu
Y. Chen
Shengyu Zhang
Hongxia Yang
Fan Wu
Fei Wu
LLMAG
167
0
0
08 May 2025
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
F. Khan
Jun Chen
Youssef Mohamed
Chun-Mei Feng
Mohamed Elhoseiny
VLM
33
0
0
08 May 2025
Visual Affordances: Enabling Robots to Understand Object Functionality
Visual Affordances: Enabling Robots to Understand Object Functionality
Tommaso Apicella
Alessio Xompero
Andrea Cavallaro
43
0
0
08 May 2025
Concept-Based Unsupervised Domain Adaptation
Concept-Based Unsupervised Domain Adaptation
Xinyue Xu
Y. Hu
Hui Tang
Yi Qin
Lu Mi
Hao Wang
Xiaomeng Li
50
0
0
08 May 2025
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang
Bo Feng
Zhengfeng Lai
Mingze Xu
Shiyu Li
Weifeng Ge
Afshin Dehghan
Meng Cao
Ping-Chia Huang
OffRL
51
0
0
08 May 2025
OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
Sifan Song
Siyeop Yoon
Pengfei Jin
Sekeun Kim
Matthew Tivnan
...
Zhiliang Lyu
Dufan Wu
Ning Guo
Xiang Li
Quanzheng Li
OOD
ViT
64
0
0
08 May 2025
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
Yonwoo Choi
3DGS
VGen
62
0
0
08 May 2025
ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
Wanjiang Weng
Xiaofeng Tan
Hongsong Wang
Pan Zhou
VGen
51
0
0
08 May 2025
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLM
LRM
49
0
0
08 May 2025
Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models
Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models
Wei Peng
Kang Liu
Jianchen Hu
Meng Zhang
VLM
LM&MA
50
0
0
08 May 2025
Does CLIP perceive art the same way we do?
Does CLIP perceive art the same way we do?
Andrea Asperti
Leonardo Dessì
Maria Chiara Tonetti
Nico Wu
48
0
0
08 May 2025
Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know
Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know
Shireen Kudukkil Manchingal
Fabio Cuzzolin
56
0
0
08 May 2025
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
T. Kaiser
Thomas Norrenbrock
Bodo Rosenhahn
48
0
0
08 May 2025
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
Enhao Zhang
Chaohua Li
Chuanxing Geng
Songcan Chen
56
0
0
08 May 2025
OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning
OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning
Cong Hua
Qianqian Xu
Zhiyong Yang
Zitai Wang
Shilong Bao
Qingming Huang
VLM
55
1
0
08 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
53
0
0
08 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
70
0
0
08 May 2025
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Aishwarya Venkataramanan
P. Bodesheim
Joachim Denzler
BDL
VLM
64
0
0
08 May 2025
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Tong Wang
Ting Liu
Xiaochao Qu
Chengjing Wu
Luoqi Liu
Xiaolin Hu
DiffM
58
0
0
08 May 2025
Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization
Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization
Xi Yang
Songsong Duan
Nannan Wang
Xinbo Gao
WSOL
78
0
0
08 May 2025
Split Matching for Inductive Zero-shot Semantic Segmentation
Split Matching for Inductive Zero-shot Semantic Segmentation
Jialei Chen
Xu Zheng
Dongyue Li
Chong Yi
Seigo Ito
D. Paudel
Luc Van Gool
Hiroshi Murase
Daisuke Deguchi
VLM
54
0
0
08 May 2025
InstanceGen: Image Generation with Instance-level Instructions
InstanceGen: Image Generation with Instance-level Instructions
Etai Sella
Yanir Kleiman
Hadar Averbuch-Elor
33
0
0
08 May 2025
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
50
0
0
08 May 2025
Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models
Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models
Aarti Ghatkesar
Uddeshya Upadhyay
Ganesh Venkatesh
VLM
38
0
0
08 May 2025
FLAM: Frame-Wise Language-Audio Modeling
FLAM: Frame-Wise Language-Audio Modeling
Yusong Wu
Christos Tsirigotis
Ke Chen
Cheng-Zhi Anna Huang
Aaron C. Courville
Oriol Nieto
Prem Seetharaman
Justin Salamon
50
0
0
08 May 2025
Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos
Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos
Giulio Cesare Mastrocinque Santo
Patrícia Izar
Irene Delval
Victor de Napole Gregolin
Nina S. T. Hirata
VGen
40
0
0
08 May 2025
X-Driver: Explainable Autonomous Driving with Vision-Language Models
X-Driver: Explainable Autonomous Driving with Vision-Language Models
Wei Liu
J. A. Zhang
Binxiong Zheng
Yufeng Hu
Yingzhan Lin
Zengfeng Zeng
VLM
LRM
60
0
0
08 May 2025
Generating Physically Stable and Buildable LEGO Designs from Text
Generating Physically Stable and Buildable LEGO Designs from Text
Ava Pun
Kangle Deng
Ruixuan Liu
Deva Ramanan
Changliu Liu
Jun-Yan Zhu
69
0
0
08 May 2025
Learning to Drive Anywhere with Model-Based Reannotation
Learning to Drive Anywhere with Model-Based Reannotation
Noriaki Hirose
Lydia Ignatova
Kyle Stachowicz
Catherine Glossop
Sergey Levine
Dhruv Shah
24
0
0
08 May 2025
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar
Gayatri S Deshmukh
Yalcin Tur
Ulas Bagci
MedIm
53
0
0
08 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
44
0
0
08 May 2025
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Agnese Chiatti
Sara Bernardini
Lara Shibelski Godoy Piccolo
Viola Schiaffonati
Matteo Matteucci
62
0
0
08 May 2025
PADriver: Towards Personalized Autonomous Driving
PADriver: Towards Personalized Autonomous Driving
Genghua Kou
Fan Jia
Weixin Mao
Yong-Jin Liu
Yucheng Zhao
Ziheng Zhang
Osamu Yoshie
Tiancai Wang
Y. Li
Xinming Zhang
49
0
0
08 May 2025
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li
Weijian Ma
Xueyang Li
Yunzhong Lou
G. Zhou
Xiangdong Zhou
34
0
0
07 May 2025
Componential Prompt-Knowledge Alignment for Domain Incremental Learning
Componential Prompt-Knowledge Alignment for Domain Incremental Learning
Kunlun Xu
Xu Zou
Gang Hua
Jiahuan Zhou
CLL
80
0
0
07 May 2025
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
Zilong Chen
Yikai Wang
Wenqiang Sun
Feng Wang
Yiwen Chen
Huaping Liu
34
0
0
07 May 2025
DMRL: Data- and Model-aware Reward Learning for Data Extraction
DMRL: Data- and Model-aware Reward Learning for Data Extraction
Zhiqiang Wang
Ruoxi Cheng
31
0
0
07 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Yulin Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu
Daniel Fu
Noah Tan
Itai Lang
Rana Hanocka
3DH
45
0
0
07 May 2025
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $α$-$β$-Divergence
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via ααα-βββ-Divergence
Guanghui Wang
Zhiyong Yang
Zhilin Wang
Shi Wang
Qianqian Xu
Q. Huang
42
0
0
07 May 2025
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin
Zhu Xu
Yang Liu
24
0
0
07 May 2025
TetWeave: Isosurface Extraction using On-The-Fly Delaunay Tetrahedral Grids for Gradient-Based Mesh Optimization
TetWeave: Isosurface Extraction using On-The-Fly Delaunay Tetrahedral Grids for Gradient-Based Mesh Optimization
Alexandre Binninger
Ruben Wiersma
Philipp Herholz
O. Sorkine-Hornung
141
0
0
07 May 2025
Multi-turn Consistent Image Editing
Multi-turn Consistent Image Editing
Zijun Zhou
Yingying Deng
Xiangyu He
Weiming Dong
Fan Tang
50
0
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
57
0
0
07 May 2025
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
Feng Xiao
Hongbin Xu
Guocan Zhao
Wenxiong Kang
50
0
0
07 May 2025
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava
Xiang Zhang
He Wen
Chenru Wen
Zhuowen Tu
DiffM
34
0
0
07 May 2025
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li
Y. Liu
Haoqin Tu
Hongru Zhu
Cihang Xie
VLM
145
0
0
07 May 2025
Previous
12345...193194195
Next