ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
Qi Lv
Xiang Deng
Gongwei Chen
Michael Yu Wang
Liqiang Nie
174
8
0
08 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
168
14
0
08 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAMLVLM
129
13
0
08 Jun 2024
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina
Adriana Valentina Costache
Adrian-Gabriel Chifu
Josiane Mothe
Radu Tudor Ionescu
VLM
143
1
0
07 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
139
37
0
07 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Yu Guo
VGen
261
17
0
06 Jun 2024
MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition
MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition
Stefan Gerd Fritsch
Cennet Oğuz
Vitor Fortes Rey
L. Ray
Maximilian Kiefer-Emmanouilidis
Paul Lukowicz
HAI
82
0
0
06 Jun 2024
Interpreting the Second-Order Effects of Neurons in CLIP
Interpreting the Second-Order Effects of Neurons in CLIP
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
MILM
126
24
0
06 Jun 2024
Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection
Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection
Jash Dalvi
Ali Dabouei
Gunjan Dhanuka
Min Xu
61
0
0
05 Jun 2024
Tiny models from tiny data: Textual and null-text inversion for few-shot distillation
Tiny models from tiny data: Textual and null-text inversion for few-shot distillation
Erik Landolsi
Fredrik Kahl
DiffM
109
1
0
05 Jun 2024
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang
H. Wu
Chunyi Li
Yingjie Zhou
Wei Sun
Xiongkuo Min
Zijian Chen
Xiaohong Liu
Weisi Lin
Guangtao Zhai
EGVM
123
18
0
05 Jun 2024
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Tianren Zhang
Chujie Zhao
Guanyu Chen
Yizhou Jiang
Feng Chen
OODMLTOODD
181
6
0
05 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQVGen
187
35
0
04 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAGLRMELMReLM
173
37
0
04 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Yangfu Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
112
12
0
04 Jun 2024
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Kengo Uchida
Takashi Shibuya
Yuhta Takida
Naoki Murata
Shusuke Takahashi
Shusuke Takahashi
Yuki Mitsufuji
VGen
132
5
0
04 Jun 2024
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed El Amine Boudjoghra
Angela Dai
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Fahad Shahbaz Khan
VLMISeg
178
6
0
04 Jun 2024
Proxy Denoising for Source-Free Domain Adaptation
Proxy Denoising for Source-Free Domain Adaptation
Song Tang
Wenxin Su
Mao Ye
Jianwei Zhang
Xiatian Zhu
Xiatian Zhu
149
2
0
03 Jun 2024
Robust Classification by Coupling Data Mollification with Label Smoothing
Robust Classification by Coupling Data Mollification with Label Smoothing
Markus Heinonen
Ba-Hien Tran
Michael Kampffmeyer
Maurizio Filippone
150
1
0
03 Jun 2024
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
Li Lin
Santosh
Xin Eric Wang
Shu Hu
Shu Hu
EGVM
140
12
0
02 Jun 2024
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Hyunseung Kim
ByungKun Lee
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Jaegul Choo
135
1
0
01 Jun 2024
BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation
BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation
Qile Fan
Penghang Yu
Zhiyi Tan
Bing-Kun Bao
Guanming Lu
242
1
0
01 Jun 2024
Reward Machines for Deep RL in Noisy and Uncertain Environments
Reward Machines for Deep RL in Noisy and Uncertain Environments
Andrew C. Li
Zizhao Chen
Toryn Q. Klassen
Pashootan Vaezipoor
Rodrigo Toro Icarte
Sheila A. McIlraith
137
7
0
31 May 2024
Amortizing intractable inference in diffusion models for vision, language, and control
Amortizing intractable inference in diffusion models for vision, language, and control
S. Venkatraman
Moksh Jain
Luca Scimeca
Minsu Kim
Marcin Sendera
...
Alexandre Adam
Jarrid Rector-Brooks
Yoshua Bengio
Glen Berseth
Nikolay Malkin
170
32
0
31 May 2024
Information Theoretic Text-to-Image Alignment
Information Theoretic Text-to-Image Alignment
Chao Wang
Giulio Franzese
A. Finamore
Massimo Gallo
Pietro Michiardi
131
0
0
31 May 2024
Scaling White-Box Transformers for Vision
Scaling White-Box Transformers for Vision
Jinrui Yang
Xianhang Li
Druv Pai
Yuyin Zhou
Yi-An Ma
Yaodong Yu
Cihang Xie
ViT
81
9
0
30 May 2024
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour
Victor Besnier
Vicky Kalogeiton
David Picard
DiffM
110
2
0
30 May 2024
TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes
TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes
Minghao Guo
Bohan Wang
Kaiming He
Wojciech Matusik
3DGS
139
7
0
30 May 2024
Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion
Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion
Jiangkai Wu
Liming Liu
Yunpeng Tan
Junlin Hao
Xinggong Zhang
120
3
0
30 May 2024
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Phillip Howard
Kathleen C. Fraser
Anahita Bhiwandiwalla
S. Kiritchenko
122
13
0
30 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
211
5
0
29 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLMISeg
134
5
0
28 May 2024
Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification
Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification
Weizhen He
Yiheng Deng
Yunfeng Yan
Feng Zhu
Yizhou Wang
Lei Bai
Qingsong Xie
Donglian Qi
Wanli Ouyang
Shixiang Tang
154
3
0
28 May 2024
An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates
An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates
Albin Soutif--Cormerais
Simone Magistri
Joost van de Weijer
Andew D. Bagdanov
100
2
0
28 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
Anton Van Den Hengel
VLM
116
1
0
27 May 2024
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Jiannan Huang
Jun Hao Liew
Hanshu Yan
Yuyang Yin
Yao Zhao
Yunchao Wei
Yunchao Wei
DiffM
185
7
0
27 May 2024
Ensembling Diffusion Models via Adaptive Feature Aggregation
Ensembling Diffusion Models via Adaptive Feature Aggregation
Cong Wang
Kuan Tian
Yonghang Guan
Jun Zhang
Zhiwei Jiang
Fei Shen
Xiao Han
123
6
0
27 May 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffMSLR
146
7
0
27 May 2024
Smoke and Mirrors in Causal Downstream Tasks
Smoke and Mirrors in Causal Downstream Tasks
Riccardo Cadei
Lukas Lindorfer
Sylvia Cremer
Cordelia Schmid
Francesco Locatello
CML
114
6
0
27 May 2024
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Kuan-Chih Huang
Xiangtai Li
Lu Qi
Shuicheng Yan
Ming-Hsuan Yang
LRM
148
12
0
27 May 2024
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
Zhuoling Li
Xiaogang Xu
Zhenhua Xu
Sernam Lim
Hengshuang Zhao
LM&Ro
124
2
0
27 May 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRMMLLM
167
17
0
27 May 2024
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang
Yukun Zhou
Mingjia Shi
Zhihang Yuan
Yuzhang Shang
Yuzhang Shang
Hanwang Zhang
Hanwang Zhang
Yang You
136
14
0
27 May 2024
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Kuo-Han Hung
Pang-Chi Lo
Jia-Fong Yeh
Han-Yuan Hsu
Yi-Ting Chen
Winston H. Hsu
137
0
0
26 May 2024
Unsupervised Meta-Learning via In-Context Learning
Unsupervised Meta-Learning via In-Context Learning
Anna Vettoruzzo
Lorenzo Braccaioli
Joaquin Vanschoren
M. Nowaczyk
SSL
118
1
0
25 May 2024
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
Yibo Zhang
Lihong Wang
Changqing Zou
Tieru Wu
Rui Ma
DiffM
93
4
0
24 May 2024
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Feng Liang
Akio Kodaira
Chenfeng Xu
Masayoshi Tomizuka
Kurt Keutzer
Diana Marculescu
DiffMVGen
172
9
0
24 May 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLMDiffM
151
22
0
24 May 2024
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
Abdelrahman Abdelhamed
Mahmoud Afifi
Alec Go
MLLMVLM
105
3
0
24 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Dinesh Manocha
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
177
38
0
24 May 2024
Previous
123...262728...333435
Next