ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Learning Invariant Causal Mechanism from Vision-Language Models
Learning Invariant Causal Mechanism from Vision-Language Models
Changwen Zheng
Siyu Zhao
Xingyu Zhang
Jiangmeng Li
Changwen Zheng
Jingyao Wang
CMLBDLVLM
110
0
0
24 May 2024
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation
Se-eun Yoon
Hyunsik Jeon
Julian McAuley
80
0
0
23 May 2024
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Katherine Xu
Lingzhi Zhang
Jianbo Shi
134
17
0
23 May 2024
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Mohammed Baharoon
Jonathan Klein
D. L. Michels
SSLVLM
128
0
0
23 May 2024
Text-to-Model: Text-Conditioned Neural Network Diffusion for Train-Once-for-All Personalization
Text-to-Model: Text-Conditioned Neural Network Diffusion for Train-Once-for-All Personalization
Zexi Li
Lingzhi Gao
Chao Wu
AI4CEDiffM
123
4
0
23 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
182
8
0
23 May 2024
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
Simon Damm
M. Laszkiewicz
Johannes Lederer
Asja Fischer
104
8
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
307
54
0
23 May 2024
PerSense: Personalized Instance Segmentation in Dense Images
PerSense: Personalized Instance Segmentation in Dense Images
Muhammad Ibraheem Siddiqui
Muhammad Umer Sheikh
Hassan Abid
Muhammad Haris Khan
VLM
116
0
0
22 May 2024
Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance
Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance
Kaifeng Zhang
Zhao-Heng Yin
Weirui Ye
Yang Gao
135
4
0
22 May 2024
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
N. Sebe
Mubarak Shah
EGVM
170
7
0
22 May 2024
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
Jiajie Jin
Yutao Zhu
Xinyu Yang
Chenghao Zhang
Zhicheng Dou
Chenghao Zhang
Tong Zhao
Zhao Yang
Zhicheng Dou
Ji-Rong Wen
VLM
155
72
0
22 May 2024
LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting
LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting
Jia Gong
Shenyu Ji
Lin Geng Foo
Kang Chen
Hossein Rahmani
Jun Liu
3DGS
113
6
0
21 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
138
9
0
20 May 2024
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
Ahnaf Mozib Samin
M. F. Ahmed
Md. Mushtaq Shahriyar Rafee
VLM
103
3
0
19 May 2024
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Junzhang Liu
Zhecan Wang
Hammad A. Ayyubi
Haoxuan You
Chris Thomas
Rui Sun
Shih-Fu Chang
Kai-Wei Chang
140
0
0
18 May 2024
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
107
13
0
16 May 2024
Networking Systems for Video Anomaly Detection: A Tutorial and Survey
Networking Systems for Video Anomaly Detection: A Tutorial and Survey
Jing Liu
Yang Liu
Jieyu Lin
Jielin Li
Peng Sun
Bo Hu
Liang Song
Azzedine Boukerche
Victor C.M. Leung
Victor C.M. Leung
195
12
0
16 May 2024
Contextual Emotion Recognition using Large Vision Language Models
Contextual Emotion Recognition using Large Vision Language Models
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
VLM
107
4
0
14 May 2024
A Survey on Personalized Content Synthesis with Diffusion Models
A Survey on Personalized Content Synthesis with Diffusion Models
Xu-Lu Zhang
Xiao Wei
Wengyu Zhang
Jinlin Wu
Jiaxin Wu
Zhen Lei
Zhaoxiang Zhang
Zhen Lei
Qing Li
EGVM
194
22
0
09 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
146
18
0
08 May 2024
General Place Recognition Survey: Towards Real-World Autonomy
General Place Recognition Survey: Towards Real-World Autonomy
Peng Yin
Jianhao Jiao
Shiqi Zhao
Lingyun Xu
Guoquan Huang
Howie Choset
Sebastian A. Scherer
Jianda Han
157
6
0
08 May 2024
Policy Learning with a Language Bottleneck
Policy Learning with a Language Bottleneck
Megha Srivastava
Cédric Colas
Dorsa Sadigh
Jacob Andreas
110
3
0
07 May 2024
GREEN: Generative Radiology Report Evaluation and Error Notation
GREEN: Generative Radiology Report Evaluation and Error Notation
Sophie Ostmeier
Justin Xu
Zhihong Chen
Maya Varma
Louis Blankemeier
...
Arne Edward Michalson
Michael E. Moseley
Curtis P. Langlotz
Akshay S. Chaudhari
Jean-Benoit Delbrouck
MedIm
93
28
0
06 May 2024
UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification
UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification
Siddhant Kharbanda
Devaansh Gupta
K. Gururaj
Pankaj Malhotra
Cho-Jui Hsieh
Rohit Babbar
Rohit Babbar
80
1
0
04 May 2024
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Hongyu Qu
Rui Yan
Xiangbo Shu
Haoliang Gao
Peng Huang
Guo-Sen Xie
116
4
0
03 May 2024
Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields
Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields
Yuhang Huang
SHilong Zou
Xinwang Liu
K. Xu
DiffM
153
0
0
02 May 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie
Polina Kirichenko
Mark Ibrahim
Mahmoud Assran
Andrew Gordon Wilson
Aaron Courville
Nicolas Ballas
CLIPVLM
151
23
0
30 Apr 2024
X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
Emmanuelle Bourigault
Abdullah Hamdi
Amir Jamaludin
MedIm
107
2
0
30 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLMLRM
220
197
0
29 Apr 2024
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman
Noam Rotstein
Roy Ganz
Ron Kimmel
DiffM
121
16
0
28 Apr 2024
Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission
Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission
Mingyu Yang
Bowen Liu
Boyang Wang
Hun-Seok Kim
DiffM
91
6
0
27 Apr 2024
ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion
ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion
Ziyue Zhang
Mingbao Lin
Rongrong Ji
Rongrong Ji
DiffM
125
3
0
26 Apr 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
97
12
0
25 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
231
22
0
25 Apr 2024
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
Jiaxin Zhuang
Linshan Wu
Qiong Wang
V. Vardhanabhuti
Lin Luo
Hao Chen
Hao Chen
116
4
0
24 Apr 2024
SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models
SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models
Bo Lin
Yingjing Xu
Xuanwen Bao
Zhou Zhao
Zuyong Zhang
Zhouyang Wang
115
3
0
23 Apr 2024
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
Wencheng Zhu
Xin Zhou
Pengfei Zhu
Yu Wang
Qinghua Hu
VLM
126
1
0
22 Apr 2024
A Multimodal Automated Interpretability Agent
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
202
27
0
22 Apr 2024
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
Xuzheng Yu
Chen Jiang
Xingning Dong
Tian Gan
Ming Yang
Qingpei Guo
91
2
0
22 Apr 2024
MultiBooth: Towards Generating All Your Concepts in an Image from Text
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Chenyang Zhu
Kai Li
Yue Ma
Chunming He
Li Xiu
DiffM
190
29
0
22 Apr 2024
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Chengrui Wang
Pengfei Liu
Min Zhou
Ming Zeng
Xubin Li
Tiezheng Ge
Bo Zheng
DiffM
117
5
0
22 Apr 2024
LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation
LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation
Haoyu Zheng
Wenqiao Zhang
Yaoke Wang
Hao Zhou
Jiang Liu
Juncheng Li
Zheqi Lv
Siliang Tang
Yueting Zhuang
Yueting Zhuang
111
1
0
21 Apr 2024
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Jingqun Tang
Chunhui Lin
Zhen Zhao
Shubo Wei
Binghong Wu
...
Yuliang Liu
Xiang Bai
Can Huang
Xiang Bai
Can Huang
LRMVLMMLLM
175
30
0
19 Apr 2024
MeshLRM: Large Reconstruction Model for High-Quality Meshes
MeshLRM: Large Reconstruction Model for High-Quality Meshes
Xinyue Wei
Kai Zhang
Sai Bi
Hao Tan
Fujun Luan
Valentin Deschaintre
Kalyan Sunkavalli
Hao Su
Zexiang Xu
AI4CE
173
81
0
18 Apr 2024
TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation
TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation
Tianyi Liang
Jiangqi Liu
Sicheng Song
Shiqi Jiang
Yifei Huang
Changbo Wang
Chenhui Li
141
0
0
18 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
124
1
0
16 Apr 2024
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Peiyuan Zhi
Zhiyuan Zhang
Muzhi Han
Zeyu Zhang
Zhitian Li
Ziyuan Jiao
Ziyuan Jiao
Siyuan Huang
Siyuan Huang
LRMLM&Ro
100
33
0
16 Apr 2024
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
Jintao Sun
Zhedong Zheng
Gangyi Ding
Gangyi Ding
94
8
0
16 Apr 2024
RankCLIP: Ranking-Consistent Language-Image Pretraining
RankCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zhili Feng
Zenghui Ding
Yining Sun
SSLVLM
132
7
0
15 Apr 2024
Previous
123...272829...333435
Next