ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
DiffQRCoder: Diffusion-based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement
DiffQRCoder: Diffusion-based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement
Jia-Wei Liao
Winston Wang
Tzu-Sian Wang
Li-Xuan Peng
Ju-Hsuan Weng
Cheng-Fu Chou
Jun-Cheng Chen
DiffM
106
2
0
10 Sep 2024
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
Amin Karimi Monsefi
Kishore Prakash Sailaja
Ali Alilooee
Ser-Nam Lim
R. Ramnath
VLM
90
9
0
10 Sep 2024
Towards Generalizable Scene Change Detection
Towards Generalizable Scene Change Detection
Jaewoo Kim
Uehwan Kim
90
0
0
10 Sep 2024
NeIn: Telling What You Don't Want
NeIn: Telling What You Don't Want
Nhat-Tan Bui
Dinh-Hieu Hoang
Quoc-Huy Trinh
Minh-Triet Tran
Truong Nguyen
Susan Gauch
133
2
0
09 Sep 2024
CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
Nan Chen
Mengqi Huang
Zhuowei Chen
Yang Zheng
Lei Zhang
Zhendong Mao
DiffM
127
6
0
09 Sep 2024
Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion
Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion
Yan Chen
Di Huang
Zhichao Liao
Xi Cheng
Xinghui Li
Lone Zeng
3DPC
169
1
0
07 Sep 2024
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
Jianbiao Mei
T. Hu
Xuemeng Yang
Licheng Wen
Yu Yang
Tiantian Wei
Yukai Ma
Min Dou
Botian Shi
Yong Liu
VGenDiffM
152
6
0
06 Sep 2024
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers
Lorenza Prospero
Abdullah Hamdi
João F. Henriques
Christian Rupprecht
3DGS
101
3
0
06 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-Xiong Wang
121
23
0
05 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
448
1
0
04 Sep 2024
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Haiyu Wu
Jaskirat Singh
Sicong Tian
Liang Zheng
Kevin W. Bowyer
CVBM
125
4
0
04 Sep 2024
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models
Lipeng Ma
Weidong Yang
Sihang Jiang
Ben Fei
Mingjie Zhou
Shuhao Li
Bo Xu
Bo Xu
Yanghua Xiao
137
0
0
03 Sep 2024
EEG-Language Modeling for Pathology Detection
EEG-Language Modeling for Pathology Detection
Sam Gijsen
Kerstin Ritter
130
4
0
02 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
258
15
0
02 Sep 2024
Training-Free Sketch-Guided Diffusion with Latent Optimization
Training-Free Sketch-Guided Diffusion with Latent Optimization
Sandra Zhang Ding
Jiafeng Mao
Kiyoharu Aizawa
DiffM
164
3
0
31 Aug 2024
Multi-Output Distributional Fairness via Post-Processing
Multi-Output Distributional Fairness via Post-Processing
Gang Li
Qihang Lin
Ayush Ghosh
Tianbao Yang
152
0
0
31 Aug 2024
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou
Haote Yang
Dairong Chen
Junyan Ye
Tianyi Bai
Jinhua Yu
Songyang Zhang
Dahua Lin
Conghui He
Weijia Li
VLM
149
7
0
30 Aug 2024
Medical Report Generation Is A Multi-label Classification Problem
Medical Report Generation Is A Multi-label Classification Problem
Yijian Fan
Zhenbang Yang
Rui Liu
Mingjie Li
Xiaojun Chang
MedIm
118
1
0
30 Aug 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
Mustansar Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
258
5
0
30 Aug 2024
Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach
Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach
Mian Zou
Baosheng Yu
Yibing Zhan
Siwei Lyu
Kede Ma
CVBM
124
2
0
29 Aug 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
114
12
0
29 Aug 2024
ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution
ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution
Sungduk Yu
Brian L. White
Anahita Bhiwandiwalla
Musashi Hinck
Matthew Lyle Olson
Tung Nguyen
Vasudev Lal
Tung Nguyen
Vasudev Lal
90
0
0
28 Aug 2024
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
Yuan Tang
Xu Han
Xianzhi Li
Qiao Yu
Jinfeng Xu
Yixue Hao
Long Hu
Min Chen
152
3
0
28 Aug 2024
Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration
Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration
Xu Zhang
Jiaqi Ma
Guoli Wang
Qian Zhang
Huan Zhang
Lefei Zhang
VLM
152
9
0
28 Aug 2024
MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
Hyunwoo Kim
Itai Lang
Noam Aigerman
Thibault Groueix
Vladimir G. Kim
Rana Hanocka
AI4CE
100
3
0
27 Aug 2024
NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals
NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals
Wei-Bang Jiang
Yansen Wang
Bao-Liang Lu
Dongsheng Li
121
15
0
27 Aug 2024
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
Junyao Ge
Xu Zhang
Yang Zheng
Kaitai Guo
Jimin Liang
134
2
0
27 Aug 2024
Diffusion based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection
Diffusion based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection
Suhee Yoon
Sanghyu Yoon
Ye Seul Sim
Sungik Choi
Kyungeun Lee
Hye-Seung Cho
Hankook Lee
Woohyung Lim
69
0
0
27 Aug 2024
The Benefits of Balance: From Information Projections to Variance Reduction
The Benefits of Balance: From Information Projections to Variance Reduction
Lang Liu
Ronak R. Mehta
Soumik Pal
Zaïd Harchaoui
60
0
0
27 Aug 2024
An Embedding is Worth a Thousand Noisy Labels
An Embedding is Worth a Thousand Noisy Labels
Francesco Di Salvo
Sebastian Doerrich
Ines Rieger
Christian Ledig
NoLa
132
0
0
26 Aug 2024
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVMVGenDiffM
165
2
0
26 Aug 2024
Atlas Gaussians Diffusion for 3D Generation
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang
Yuan Dong
Hanwen Jiang
Dejia Xu
Georgios Pavlakos
Qixing Huang
3DGS
166
3
0
23 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
153
1
0
23 Aug 2024
Dynamics of Meta-learning Representation in the Teacher-student Scenario
Dynamics of Meta-learning Representation in the Teacher-student Scenario
Hui Wang
Cho Tung Yip
Bo Li
111
0
0
22 Aug 2024
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
Artem Snegirev
Maria Tikhonova
Anna Maksimova
Alena Fenogenova
Alexander Abramov
206
6
0
22 Aug 2024
CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction
CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction
Rong Han
Xiaohong Liu
Tong Pan
Jing Xu
Xiaoyu Wang
...
Zhenyu Li
Zixuan Wang
Jiangning Song
Guangyu Wang
Ting Chen
79
2
0
21 Aug 2024
Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models
Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models
Chun-Yen Shih
Li-Xuan Peng
Jia-Wei Liao
Ernie Chu
Cheng-Fu Chou
Jun-Cheng Chen
AAMLDiffM
86
1
0
21 Aug 2024
Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection
Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection
Jingwei Sun
Xuchong Zhang
Changfeng Sun
Qicheng Bai
Hongbin Sun
AAMLDiffM
132
0
0
21 Aug 2024
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
Feipeng Ma
Yizhou Zhou
Hebei Li
Zilong He
Siying Wu
Fengyun Rao
Siying Wu
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
169
8
0
21 Aug 2024
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting
Liyao Jiang
Negar Hassanpour
Mohammad Salameh
Mohan Sai Singamsetti
Fengyu Sun
Wei Lu
Di Niu
DiffM
129
2
0
21 Aug 2024
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
Guofeng Mei
Luigi Riz
Yiming Wang
Fabio Poiesi
ISegVLM
115
4
0
20 Aug 2024
Perception-guided Jailbreak against Text-to-Image Models
Perception-guided Jailbreak against Text-to-Image Models
Yihao Huang
Le Liang
Tianlin Li
Xiaojun Jia
Run Wang
Weikai Miao
G. Pu
Yang Liu
90
11
0
20 Aug 2024
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
Alex N. Wang
Christopher Hoang
Yuwen Xiong
Yann LeCun
Mengye Ren
218
0
0
20 Aug 2024
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
Youjun Zhao
Jiaying Lin
Shuquan Ye
Qianshi Pang
Rynson W. H. Lau
152
2
0
20 Aug 2024
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs
Eui Jun Hwang
Sukmin Cho
Junmyeong Lee
Jong C. Park
SLR
119
5
0
20 Aug 2024
Understanding Generative AI Content with Embedding Models
Understanding Generative AI Content with Embedding Models
Max Vargas
Reilly Cannon
A. Engel
Anand D. Sarwate
Tony Chiang
196
3
0
19 Aug 2024
DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization
DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization
Pucheng Dang
Xing Hu
Dong Li
Rui Zhang
Qi Guo
Kaidi Xu
DiffM
85
7
0
18 Aug 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
Hao Fei
DiffM
140
1
0
16 Aug 2024
Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Yang Nan
Huichi Zhou
Xiaodan Xing
Guang Yang
98
3
0
16 Aug 2024
Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models
Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models
Lin Zhao
Xiao Chen
Eric Z. Chen
Yikang Liu
Terrence Chen
Shanhui Sun
VLM
107
6
0
16 Aug 2024
Previous
123...232425...333435
Next