ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
Aaron Lohner
Francesco Compagno
Jonathan M Francis
A. Oltramari
132
3
0
10 Jan 2025
Multi-subject Open-set Personalization in Video Generation
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Yuwei Fang
Kwot Sin Lee
Ivan Skorokhodov
Kfir Aberman
Jun-Yan Zhu
Ming-Hsuan Yang
Sergey Tulyakov
DiffMVGen
172
13
0
10 Jan 2025
RadGPT: Constructing 3D Image-Text Tumor Datasets
RadGPT: Constructing 3D Image-Text Tumor Datasets
P. R. Bassi
Mehmet Can Yavuz
Kang Wang
Xiaoxi Chen
Wenxuan Li
S. Decherchi
Andrea Cavalli
Yang Yang
Alan Yuille
Zongwei Zhou
LM&MAMedIm
116
2
0
08 Jan 2025
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision
Kangsheng Yin
Quan Liu
Xuelin Shen
Yulin He
Wenhan Yang
Shiqi Wang
VLM
134
0
0
08 Jan 2025
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Yunxing Liu
Pengxiang Li
Zishu Wei
C. Xie
Xueyu Hu
Xinchen Xu
Shengyu Zhang
Xiaotian Han
Hongxia Yang
Leilei Gan
LLMAGLRM
128
21
0
08 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Tian Jin
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
114
1
0
08 Jan 2025
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
Yuzhou Huang
Ziyang Yuan
Quande Liu
Qiulin Wang
Xintao Wang
Ruimao Zhang
Pengfei Wan
Di Zhang
Kun Gai
VGenDiffM
128
16
0
08 Jan 2025
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
S. Joshi
Besmira Nushi
Vidhisha Balachandran
Varun Chandrasekaran
Vibhav Vineet
Neel Joshi
Baharan Mirzasoleiman
MLLMVLM
165
0
0
07 Jan 2025
MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer
MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer
Junsheng Luan
Guangyuan Li
Lei Zhao
Wei Xing
DiffM
68
3
0
07 Jan 2025
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Wen-Dong Jiang
Chih-Yung Chang
Diptendu Sinha Roy
135
0
0
07 Jan 2025
Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis
Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis
Yiliang Chen
Steven SC Ho
Cheng Xu
Yao Jie Xie
Wing-Fai Yeung
Shengfeng He
Jing Qin
LM&MA
85
0
0
06 Jan 2025
ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking
Tingyang Zhang
Chen Wang
Zhiyang Dou
Qingzhe Gao
Jiahui Lei
Baoquan Chen
Lingjie Liu
3DV
102
0
0
06 Jan 2025
MObI: Multimodal Object Inpainting Using Diffusion Models
MObI: Multimodal Object Inpainting Using Diffusion Models
Alexandru Buburuzan
Anuj Sharma
John Redford
P. Dokania
Romain Mueller
DiffM
168
1
0
06 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
390
6
0
05 Jan 2025
Facial Attractiveness Prediction in Live Streaming: A New Benchmark and Multi-modal Method
Haoyang Li
Xiaoyu Ren
Hongjiu Yu
Huiyu Duan
Kai Li
Ying Chen
Libo Wang
Xiongkuo Min
Guangtao Zhai
Xu Liu
CVBM
157
0
0
05 Jan 2025
Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning
Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning
Zhongyi Zhou
Chaomin Shen
Pin Yi
Minjie Zhu
Yaxin Peng
439
0
0
04 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffMVLM
126
0
0
03 Jan 2025
Measuring Error Alignment for Decision-Making Systems
Measuring Error Alignment for Decision-Making Systems
Binxia Xu
Antonis Bikakis
Daniel Onah
A. Vlachidis
Luke Dickens
87
0
0
03 Jan 2025
Nested Attention: Semantic-aware Attention Values for Concept Personalization
Or Patashnik
Rinon Gal
Daniil Ostashev
Sergey Tulyakov
Kfir Aberman
Daniel Cohen-Or
DiffM
104
6
0
03 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
148
8
0
03 Jan 2025
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Yifan Zhang
Junhui Hou
126
1
0
03 Jan 2025
Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation
S. Park
Subeen Lee
Hyun Seok Seong
Jaejoon Yoo
Jae-Pil Heo
117
1
0
03 Jan 2025
AIM: Additional Image Guided Generation of Transferable Adversarial Attacks
Teng Li
Xingjun Ma
Yu-Gang Jiang
AAMLDiffM
124
0
0
03 Jan 2025
BatStyler: Advancing Multi-category Style Generation for Source-free Domain Generalization
Xiusheng Xu
Lei Qi
Jingyang Zhou
Xin Geng
TTA
150
0
0
03 Jan 2025
Training-free Heterogeneous Model Merging
Zhengqi Xu
Han Zheng
Jie Song
Li Sun
Mingli Song
MoMe
232
1
0
03 Jan 2025
Instruction-Guided Scene Text Recognition
Instruction-Guided Scene Text Recognition
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
188
3
0
03 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
313
59
0
03 Jan 2025
MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version
MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version
Ronghui Xu
Hanyin Cheng
Chenjuan Guo
Hongfan Gao
Jiaxi Hu
Sean Bin Yang
Bin Yang
143
5
0
03 Jan 2025
SOEDiff: Efficient Distillation for Small Object Editing
SOEDiff: Efficient Distillation for Small Object Editing
Yiming Wu
Qihe Pan
Zhen Zhao
Zicheng Wang
Sifan Long
Ronghua Liang
DiffM
155
0
0
03 Jan 2025
Refining Skewed Perceptions in Vision-Language Models through Visual Representations
Refining Skewed Perceptions in Vision-Language Models through Visual Representations
Haocheng Dai
Sarang Joshi
VLM
121
0
0
03 Jan 2025
DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data
DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data
Yuanpeng Tu
Xi Chen
Ser-Nam Lim
Hengshuang Zhao
166
1
0
03 Jan 2025
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
Benjamin Laufer
Jon M. Kleinberg
Hoda Heidari
119
11
0
03 Jan 2025
RealCustom++: Representing Images as Real-Word for Real-Time Customization
RealCustom++: Representing Images as Real-Word for Real-Time Customization
Zhendong Mao
Mengqi Huang
Fei Ding
Mingcong Liu
Qian He
Xiaojun Chang
DiffM
147
6
0
03 Jan 2025
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Lijie Tao
Han Zhang
Haizhao Jing
Yu Liu
Kelu Yao
Guoting Wei
Xizhe Xue
111
0
0
03 Jan 2025
ChemDFM-X: Towards Large Multimodal Model for Chemistry
ChemDFM-X: Towards Large Multimodal Model for Chemistry
Zihan Zhao
B. Chen
Jingpiao Li
Lu Chen
Liyang Wen
...
Ziping Wan
Yansi Li
Zhongyang Dai
Xin Chen
Kai Yu
AI4CE
215
5
0
03 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLMVLM
180
29
0
03 Jan 2025
Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion
Hebin Wang
Yangning Li
Hai-Tao Zheng
Hai-Tao Zheng
Wenhao Jiang
Hong-Gee Kim
130
0
0
03 Jan 2025
PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM
Runnan Chen
Zhaoqing Wang
Jiepeng Wang
Yuexin Ma
Mingming Gong
Wenping Wang
Tongliang Liu
3DGS
92
3
0
03 Jan 2025
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Yuzhu Cai
Sheng Yin
Yuxi Wei
Chenxin Xu
Weibo Mao
Felix Juefei Xu
Siheng Chen
Yanfeng Wang
EGVM
168
3
0
03 Jan 2025
Context-Aware Detection of Mixed Critical Events using Video Classification
Context-Aware Detection of Mixed Critical Events using Video Classification
Filza Akhlaq
Alina Arshad
Muhammad Yehya Hayati
Jawwad A. Shamsi
Muhammad Burhan Khan
118
0
0
03 Jan 2025
Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs
Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs
Linhao Huang
Xue Jiang
Zhiqiang Wang
Wentao Mo
Xi Xiao
Bo Han
Yongjie Yin
Feng Zheng
AAML
139
4
0
02 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
187
16
0
02 Jan 2025
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Yuanpeng Tu
Hao Luo
Xi Chen
S. Ji
Xiang Bai
Hengshuang Zhao
DiffMVGen
122
6
0
02 Jan 2025
Uncovering Memorization Effect in the Presence of Spurious Correlations
Uncovering Memorization Effect in the Presence of Spurious Correlations
Chenyu You
Haocheng Dai
Yifei Min
Jasjeet Sekhon
S. Joshi
James S. Duncan
130
3
0
01 Jan 2025
RORem: Training a Robust Object Remover with Human-in-the-Loop
RORem: Training a Robust Object Remover with Human-in-the-Loop
Ruibin Li
Tao Yang
Song Guo
Lefei Zhang
120
4
0
01 Jan 2025
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang
Hang Zhang
Xin Li
Jiashuo Sun
Yongliang Shen
Weiming Lu
Deli Zhao
Yueting Zhuang
Lidong Bing
VLM
138
2
0
01 Jan 2025
LoVA: Long-form Video-to-Audio Generation
LoVA: Long-form Video-to-Audio Generation
Xin Cheng
Xihua Wang
Yihan Wu
Yuyue Wang
Ruihua Song
VGenDiffM
97
3
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
255
5
0
31 Dec 2024
Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models
Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models
Linus Nwankwo
Elmar Rueckert
126
2
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILawLM&MALRM
118
29
0
31 Dec 2024
Previous
123...141516...333435
Next