ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection
Jiaqi Zhu
Shaofeng Cai
Fang Deng
Junran Wu
Junran Wu
125
18
0
15 Apr 2024
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
Jaidev Shriram
Alex Trevithick
Lingjie Liu
Ravi Ramamoorthi
DiffM3DGS
150
59
0
10 Apr 2024
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
Muer Tie
Julong Wei
Zhengjun Wang
Ke Wu
Shansuai Yuan
Kaizhao Zhang
Jie Jia
Jieru Zhao
Zhongxue Gan
Wenchao Ding
116
6
0
10 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
86
31
0
09 Apr 2024
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
Chiara Plizzari
Shubham Goel
Toby Perrett
Jacob Chalk
Angjoo Kanazawa
Dima Damen
91
12
0
07 Apr 2024
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Shenghai Yuan
Jinfa Huang
Yujun Shi
Yongqi Xu
Ruijie Zhu
Bin Lin
Xinhua Cheng
Li-xin Yuan
Jiebo Luo
VGen
157
36
0
07 Apr 2024
Dissecting Query-Key Interaction in Vision Transformers
Dissecting Query-Key Interaction in Vision Transformers
Xu Pan
Aaron Philip
Ziqian Xie
Odelia Schwartz
111
1
0
04 Apr 2024
Faster Diffusion via Temporal Attention Decomposition
Faster Diffusion via Temporal Attention Decomposition
Haozhe Liu
Wentian Zhang
Jinheng Xie
Francesco Faccio
Mengmeng Xu
Tao Xiang
Mike Zheng Shou
Juan-Manuel Perez-Rua
Jürgen Schmidhuber
DiffM
155
24
0
03 Apr 2024
A Survey on Large Language Model-Based Game Agents
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAGLM&RoAI4CELM&MA
200
57
0
02 Apr 2024
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
Jiachen Ma
Anda Cao
Zhiqing Xiao
Jie Zhang
Chaonan Ye
Chao Ye
Junbo Zhao
110
32
0
02 Apr 2024
Benchmarking Counterfactual Image Generation
Benchmarking Counterfactual Image Generation
Thomas Melistas
Nikos Spyrou
Nefeli Gkouti
Pedro Sanchez
Athanasios Vlontzos
Yannis Panagakis
G. Papanastasiou
Sotirios A. Tsaftaris
EGVMCML
116
11
0
29 Mar 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
154
47
0
29 Mar 2024
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Huiyi Wang
Haodong Lu
Lina Yao
Dong Gong
KELMCLL
115
11
0
27 Mar 2024
Learning To Guide Human Decision Makers With Vision-Language Models
Learning To Guide Human Decision Makers With Vision-Language Models
Debodeep Banerjee
Stefano Teso
Burcu Sayin
Andrea Passerini
77
1
0
25 Mar 2024
Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks
Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks
Jonathan Salfity
Selma Wanna
Minkyu Choi
Mitch Pryor
184
1
0
25 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
229
12
0
25 Mar 2024
Multiple Object Tracking as ID Prediction
Multiple Object Tracking as ID Prediction
Ruopeng Gao
Yijun Zhang
Limin Wang
180
16
0
25 Mar 2024
Edit3K: Universal Representation Learning for Video Editing Components
Edit3K: Universal Representation Learning for Video Editing Components
Xin Gu
Libo Zhang
Fan Chen
Longyin Wen
Yufei Wang
Tiejian Luo
Sijie Zhu
115
4
0
24 Mar 2024
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Zeyu Shangguan
Daniel Seita
Mohammad Rostami
ObjD
147
1
0
24 Mar 2024
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Mu Hu
Wei Yin
C. Zhang
Zhipeng Cai
Xiaoxiao Long
Kaixuan Wang
Kaixuan Wang
Gang Yu
Chunhua Shen
Shaojie Shen
3DGS
267
142
0
22 Mar 2024
Controlled Training Data Generation with Diffusion Models
Controlled Training Data Generation with Diffusion Models
Teresa Yeo
Andrei Atanov
Harold Benoit
Aleksandr Alekseev
Ruchira Ray
Pooya Esmaeil Akhoondi
Amir Zamir
99
6
0
22 Mar 2024
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Jingjing Hu
Dan Guo
Kun Li
Zhan Si
Xun Yang
Xiaojun Chang
Meng Wang
110
3
0
21 Mar 2024
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Roberto Henschel
Levon Khachatryan
Daniil Hayrapetyan
Hayk Poghosyan
Vahram Tadevosyan
Zhangyang Wang
Shant Navasardyan
Humphrey Shi
DiffMVGen
207
89
0
21 Mar 2024
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li
William H. Beluch
Margret Keuper
Dan Zhang
Anna Khoreva
DiffMVGen
116
5
0
20 Mar 2024
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
Daniel Duenias
Brennan Nichyporuk
Tal Arbel
Tammy Riklin-Raviv
95
6
0
20 Mar 2024
TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer
TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer
Eunjee Choi
Jong-Kook Kim
96
2
0
19 Mar 2024
FaceXFormer: A Unified Transformer for Facial Analysis
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan
VS Vibashan
Rama Chellappa
Vishal M. Patel
ViT
108
13
0
19 Mar 2024
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
...
Thomas Kollar
Sergey Levine
Chelsea Finn
Sergey Levine
Chelsea Finn
233
226
0
19 Mar 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
104
4
0
19 Mar 2024
Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning
Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning
Xiaohao Xu
Yunkang Cao
Huaxin Zhang
Nong Sang
Xiaonan Huang
VLM
105
11
0
17 Mar 2024
GazeFusion: Saliency-Guided Image Generation
GazeFusion: Saliency-Guided Image Generation
Yunxiang Zhang
Nan Wu
Connor Z. Lin
Gordon Wetzstein
Qi Sun
86
0
0
16 Mar 2024
Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation
Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation
Francesco Taioli
Stefano Rosa
A. Castellini
Lorenzo Natale
Alessio Del Bue
Alessandro Farinelli
Marco Cristani
Yiming Wang
103
5
0
15 Mar 2024
RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training
RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training
Zhixiu Lu
Hailong Li
N. Parikh
Jonathan R. Dillman
Lili He
MedImVLM
115
1
0
15 Mar 2024
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Jin-Young Kim
Hyojun Go
Soonwoo Kwon
Hyun-Gyoon Kim
DiffM
145
6
0
15 Mar 2024
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang
Zhimao Peng
Zhengyuan Xie
Fei Yang
Xialei Liu
Ming-Ming Cheng
122
3
0
15 Mar 2024
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
119
56
0
14 Mar 2024
ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models
ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models
Runyu Ma
Jelle Luijkx
Zlatan Ajanović
Jens Kober
LM&RoLRM
91
9
0
14 Mar 2024
HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation
HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation
D. B. Wang
Hengyu Meng
Zeyu Cai
Zhijing Shao
Qianxi Liu
Lin Wang
Mingming Fan
Xiaohang Zhan
Zhaoxiang Wang
127
3
0
14 Mar 2024
CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification
CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification
Yiming Ma
Victor Sanchez
T. Guha
78
4
0
14 Mar 2024
Specification Overfitting in Artificial Intelligence
Specification Overfitting in Artificial Intelligence
Benjamin Roth
Pedro Henrique Luz de Araujo
Yuxi Xia
Saskia Kaltenbrunner
Christoph Korab
213
1
0
13 Mar 2024
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
MLLMEGVM
163
10
0
13 Mar 2024
Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model
Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model
Yuxuan Zhang
Lifu Wei
Qing Zhang
Yiren Song
DiffM
102
17
0
12 Mar 2024
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Bingqian Lin
Yunshuang Nie
Ziming Wei
Jiaqi Chen
Shikui Ma
Jianhua Han
Hang Xu
Xiaojun Chang
Xiaodan Liang
LM&RoLRM
125
28
0
12 Mar 2024
Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference
Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference
Benjamin Eysenbach
Vivek Myers
Ruslan Salakhutdinov
Sergey Levine
AI4TS
128
12
0
06 Mar 2024
Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps
Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps
Timothy Chen
O. Shorinwa
Joseph Bruno
Javier Yu
Weijia Zeng
Weijia Zeng
Keiko Nagami
Mac Schwager
Mac Schwager
3DGS
116
35
0
05 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLMCLIP
142
12
0
05 Mar 2024
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
Xijia Tao
Shuai Zhong
Lei Li
Qi Liu
Lingpeng Kong
118
30
0
05 Mar 2024
Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation
Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation
Maksim Kuprashevich
Grigorii Alekseenko
Irina Tolstykh
ELM
111
5
0
04 Mar 2024
A Simple-but-effective Baseline for Training-free Class-Agnostic Counting
A Simple-but-effective Baseline for Training-free Class-Agnostic Counting
Yuhao Lin
Hai-Ming Xu
Lingqiao Liu
Javen Qinfeng Shi
98
1
0
03 Mar 2024
SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
Hongjian Liu
Qingsong Xie
Zhijie Deng
Chen Chen
Shixiang Tang
Fueyang Fu
Zheng-Jun Zha
H. Lu
Zheng-jun Zha
99
9
0
03 Mar 2024
Previous
123...282930...333435
Next