ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
OneDiff: A Generalist Model for Image Difference Captioning
OneDiff: A Generalist Model for Image Difference Captioning
Erdong Hu
Longteng Guo
Tongtian Yue
Zijia Zhao
Shuning Xue
Jing Liu
VLM
93
2
0
08 Jul 2024
CountGD: Multi-Modal Open-World Counting
CountGD: Multi-Modal Open-World Counting
Niki Amini-Naieni
Tengda Han
Andrew Zisserman
ObjD
142
13
0
05 Jul 2024
Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
Xiang Gao
Zhengbo Xu
Junhan Zhao
Jiaying Liu
DiffM
87
8
0
03 Jul 2024
MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications
MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications
Irene Siragusa
Salvatore Contino
Massimo La Ciura
Rosario Alicata
Roberto Pirrone
177
3
0
03 Jul 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
140
93
0
02 Jul 2024
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma
Yonglin Deng
Chen Chen
H. Lu
Zhenyu Yang
Zhenyu Yang
VLMDiffM
162
10
0
02 Jul 2024
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
Nan Xu
Fei Wang
Sheng Zhang
Hoifung Poon
Muhao Chen
114
7
0
01 Jul 2024
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models
Jayneel Parekh
Quentin Bouniot
Pavlo Mozharovskyi
A. Newson
Florence dÁlché-Buc
SSL
144
1
0
01 Jul 2024
StyleShot: A Snapshot on Any Style
StyleShot: A Snapshot on Any Style
Junyao Gao
Yanchen Liu
Yanan Sun
Yinhao Tang
Yanhong Zeng
Kai Chen
Cairong Zhao
TTA3DHVLM
162
19
0
01 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
146
31
0
28 Jun 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
185
31
0
28 Jun 2024
Odd-One-Out: Anomaly Detection by Comparing with Neighbors
Odd-One-Out: Anomaly Detection by Comparing with Neighbors
A. Bhunia
Changjian Li
Hakan Bilen
139
0
0
28 Jun 2024
MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation
MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation
Sanggeon Yun
Ryozo Masukawa
Minhyoung Na
Mohsen Imani
108
8
0
27 Jun 2024
A Sanity Check for AI-generated Image Detection
A Sanity Check for AI-generated Image Detection
Shilin Yan
Ouxiang Li
Jiayin Cai
Y. Hao
Xiaolong Jiang
Feng-Long Xie
Weidi Xie
VLM
140
37
0
27 Jun 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
288
29
0
27 Jun 2024
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Yuxuan Wan
Chaozheng Wang
Yi Dong
Wenxuan Wang
Shuqing Li
Yintong Huo
Michael R. Lyu
3DV
159
14
0
24 Jun 2024
Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making
Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making
Vivek Myers
Chongyi Zheng
Anca Dragan
Sergey Levine
Benjamin Eysenbach
OffRL
126
14
0
24 Jun 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
150
39
0
24 Jun 2024
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
Delin Qu
Qizhi Chen
Pingrui Zhang
Xianqiang Gao
Bin Zhao
Bin Zhao
Dong Wang
Xuelong Li
AI4CE
109
8
0
23 Jun 2024
EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation
EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation
Tianyu Wei
Shanmin Pang
Qi Guo
Yizhuo Ma
Yihao Huang
Ming-Ming Cheng
Qing Guo
387
2
0
22 Jun 2024
Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors
Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors
Peter Lorenz
Mario Fernandez
Jens Müller
Ullrich Kothe
AAML
197
1
0
21 Jun 2024
Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance
Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance
Selam Gano
Abraham George
A. Farimani
OnRL
83
1
0
21 Jun 2024
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Jie Ren
Kangrui Chen
Yingqian Cui
Shenglai Zeng
Hui Liu
Yue Xing
Jiliang Tang
Lingjuan Lyu
93
2
0
21 Jun 2024
A3D: Does Diffusion Dream about 3D Alignment?
A3D: Does Diffusion Dream about 3D Alignment?
Savva Ignatyev
Nina Konovalova
Daniil Selikhanovych
Nikolay Patakin
Nikolay Patakin
...
Anton Konushin
Peter Wonka
Alexander Filippov
Peter Wonka
Evgeny Burnaev
DiffM
161
1
0
21 Jun 2024
GOAL: A Generalist Combinatorial Optimization Agent Learner
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulic
Sofia Michel
J. Andreoli
91
10
0
21 Jun 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLMObjD
149
1
0
21 Jun 2024
Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation
Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation
Eyal Michaeli
Ohad Fried
108
1
0
20 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
127
7
0
20 Jun 2024
On AI-Inspired UI-Design
On AI-Inspired UI-Design
Jialiang Wei
A. Courbis
Thomas Lambolais
Gérard Dray
Walid Maalej
DiffM
85
3
0
19 Jun 2024
ARDuP: Active Region Video Diffusion for Universal Policies
ARDuP: Active Region Video Diffusion for Universal Policies
Shuaiyi Huang
Mara Levy
Zhenyu Jiang
Anima Anandkumar
Yuke Zhu
Linxi Fan
De-An Huang
Abhinav Shrivastava
VGen
104
4
0
19 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLMVLM
107
28
0
18 Jun 2024
Diffusion Models in Low-Level Vision: A Survey
Diffusion Models in Low-Level Vision: A Survey
Chunming He
Yuqi Shen
Chengyu Fang
Fengyang Xiao
Longxiang Tang
Yulun Zhang
W. Zuo
Zhenhua Guo
Xiu Li
VLMDiffMMedIm
188
42
0
17 Jun 2024
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao
Masatoshi Uehara
Gabriele Scalia
Tommaso Biancalani
Sergey Levine
Ehsan Hajiramezanali
Ehsan Hajiramezanali
AI4CE
134
7
0
17 Jun 2024
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Robert Honig
Javier Rando
Nicholas Carlini
Florian Tramèr
WIGMAAML
100
21
0
17 Jun 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
134
4
0
17 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
143
4
0
17 Jun 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLMVGen
108
11
0
15 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James V. Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
196
14
0
14 Jun 2024
Enhancing Domain Adaptation through Prompt Gradient Alignment
Enhancing Domain Adaptation through Prompt Gradient Alignment
Hoang Phan
Lam C. Tran
Quyen Tran
Trung Le
142
1
0
13 Jun 2024
WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals
WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals
L. Adam
Vojtěch Čermák
Kostas Papafitsoros
Lukás Picek
102
2
0
13 Jun 2024
WonderWorld: Interactive 3D Scene Generation from a Single Image
WonderWorld: Interactive 3D Scene Generation from a Single Image
Hong-Xing Yu
Haoyi Duan
Charles Herrmann
William T. Freeman
Jiajun Wu
3DGSVGen
178
46
0
13 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
147
3
0
13 Jun 2024
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
Joseph Spracklen
Raveen Wijewickrama
A. H. M. N. Sakib
Anindya Maiti
Murtuza Jadliwala
Murtuza Jadliwala
122
13
0
12 Jun 2024
Towards Realistic Data Generation for Real-World Super-Resolution
Towards Realistic Data Generation for Real-World Super-Resolution
Long Peng
Wenbo Li
Renjing Pei
Jingjing Ren
Xueyang Fu
Yang Wang
Yang Cao
Zheng-Jun Zha
99
20
0
11 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
112
53
0
11 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
98
4
0
10 Jun 2024
FRAG: Frequency Adapting Group for Diffusion Video Editing
FRAG: Frequency Adapting Group for Diffusion Video Editing
Sunjae Yoon
Gwanhyeong Koo
Geonwoo Kim
Chang D. Yoo
DiffM
111
5
0
10 Jun 2024
MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension
MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension
Khiem Le
Zhichun Guo
Kaiwen Dong
Xiaobao Huang
B. Nan
Roshni G. Iyer
Xiangliang Zhang
Olaf Wiest
Wei Wang
Nitesh Chawla
95
0
0
10 Jun 2024
Data Augmentation in Earth Observation: A Diffusion Model Approach
Data Augmentation in Earth Observation: A Diffusion Model Approach
Tiago Sousa
B. Ries
N. Guelfi
DiffM
107
2
0
10 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
146
15
0
09 Jun 2024
Previous
123...252627...333435
Next