ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 9,770 papers shown
Title
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
Cheng Chen
Sijie Zhu
DiffM
81
1
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Lu Ling
C. Lin
Nayeon Lee
Yin Cui
Y. Zeng
Yichen Sheng
Yunhao Ge
Ming-Yu Liu
Aniket Bera
Zhaoshuo Li
VGen
3DV
56
0
0
05 May 2025
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Yankai Jiang
Peng Zhang
D. Yang
Yuan Tian
Hai Lin
Xinyu Wang
MedIm
133
0
0
05 May 2025
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
Jerome Quenum
Wen-Han Hsieh
Tsung-Han Wu
Ritwik Gupta
Trevor Darrell
David M. Chan
MLLM
VLM
54
0
0
05 May 2025
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
SungHeon Jeong
Jihong Park
Mohsen Imani
59
0
0
05 May 2025
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Simon Ging
Sebastian Walter
Jelena Bratulić
Johannes Dienert
Hannah Bast
Thomas Brox
CLIP
27
0
0
05 May 2025
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li
Xiaolu Hou
Ziyang Liu
Dingkang Yang
Ziyun Qian
Jiawei Chen
Jinjie Wei
Y. Jiang
Qingyao Xu
Li Zhang
DiffM
156
0
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
60
2
0
04 May 2025
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
M. Tomizuka
Xue Yang
Junchi Yan
Mingyu Ding
LM&Ro
VLM
69
3
0
04 May 2025
Improving Physical Object State Representation in Text-to-Image Generative Systems
Improving Physical Object State Representation in Text-to-Image Generative Systems
Tianle Chen
Chaitanya Chakka
Deepti Ghadiyaram
34
0
0
04 May 2025
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang
Mengqi Huang
Yijing Tu
Zhendong Mao
VGen
69
0
0
04 May 2025
TxP: Reciprocal Generation of Ground Pressure Dynamics and Activity Descriptions for Improving Human Activity Recognition
TxP: Reciprocal Generation of Ground Pressure Dynamics and Activity Descriptions for Improving Human Activity Recognition
L. Ray
Lars Krupp
Vitor Fortes Rey
Bo Zhou
Sungho Suh
Paul Lukowicz
AI4CE
132
0
0
04 May 2025
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Yuchen Wang
X. Bai
X. Li
Weili Guan
Liqiang Nie
Xinyang Chen
VLM
44
0
0
04 May 2025
Parameter-Efficient Transformer Embeddings
Parameter-Efficient Transformer Embeddings
Henry Ndubuaku
Mouad Talhi
24
0
0
04 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIP
CoGe
VLM
36
0
0
04 May 2025
Robust AI-Generated Face Detection with Imbalanced Data
Robust AI-Generated Face Detection with Imbalanced Data
Yamini Sri Krubha
Aryana Hou
Braden Vester
Web Walker
Qing Guo
Li Lin
Shu Hu
29
0
0
04 May 2025
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li
Lingyun Xu
M. Zhang
Jiaming Liu
Yan Shen
...
Jiahui Xu
Liang Heng
Siyuan Huang
S. Zhang
Hao Dong
LM&Ro
51
0
0
04 May 2025
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
128
0
0
04 May 2025
Segment Any RGB-Thermal Model with Language-aided Distillation
Segment Any RGB-Thermal Model with Language-aided Distillation
Dong Xing
Xianxun Zhu
Wei Zhou
Qika Lin
Hang Yang
Yuqing Wang
VLM
61
0
0
04 May 2025
DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion
DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion
Haoteng Li
Zhao Yang
Zezhong Qian
Gongpeng Zhao
Yuqi Huang
Jun-chen Yu
Huazheng Zhou
Longjun Liu
103
1
0
03 May 2025
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Sicong Li
Qianqian Xu
Zhiyong Yang
Zitai Wang
Li Zhang
Xiaochun Cao
Q. Huang
67
0
0
03 May 2025
Efficient Multi Subject Visual Reconstruction from fMRI Using Aligned Representations
Efficient Multi Subject Visual Reconstruction from fMRI Using Aligned Representations
Christos Zangos
Danish Ebadulla
Thomas C. Sprague
Ambuj Singh
53
0
0
03 May 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
Trisanth Srinivasan
Santosh Patapati
36
0
0
03 May 2025
Topology-Aware CLIP Few-Shot Learning
Topology-Aware CLIP Few-Shot Learning
Dazhi Huang
VLM
38
0
0
03 May 2025
Rethinking Score Distilling Sampling for 3D Editing and Generation
Rethinking Score Distilling Sampling for 3D Editing and Generation
Xingyu Miao
Haoran Duan
Yang Long
J. Han
46
0
0
03 May 2025
MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization
MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization
Chenghong Li
Hongjie Liao
Yihao Zhi
Xihe Yang
Zhengwentai Sun
Jiahao Chang
Shuguang Cui
Xiaoguang Han
3DH
57
0
0
03 May 2025
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Chaomeng Chen
Zitong Yu
J. Dong
Sen Su
L. Shen
Shutao Xia
Xiaochun Cao
FedML
VLM
146
0
0
03 May 2025
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Dongxing Yu
31
0
0
03 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
56
0
0
03 May 2025
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach
Nitin Rai
Arnold W. Schumann
Nathan Boyd
MedIm
39
0
0
03 May 2025
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Congqi Cao
Lanshu Hu
Yating Yu
Y. Zhang
VLM
141
0
0
03 May 2025
RAGAR: Retrieval Augment Personalized Image Generation Guided by Recommendation
RAGAR: Retrieval Augment Personalized Image Generation Guided by Recommendation
Run Ling
Luu Anh Tuan
Yuting Liu
G. Guo
Linying Jiang
Xingwei Wang
DiffM
54
0
0
03 May 2025
ABE: A Unified Framework for Robust and Faithful Attribution-Based Explainability
ABE: A Unified Framework for Robust and Faithful Attribution-Based Explainability
Zhiyu Zhu
Jiayu Zhang
Zhibo Jin
Fang Chen
Jianlong Zhou
FAtt
24
0
0
03 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
58
0
0
03 May 2025
VSC: Visual Search Compositional Text-to-Image Diffusion Model
VSC: Visual Search Compositional Text-to-Image Diffusion Model
Do Huu Dat
Nam Hyeonu
Po Yuan Mao
Tae-Hyun Oh
DiffM
CoGe
64
0
0
02 May 2025
PREMISE: Matching-based Prediction for Accurate Review Recommendation
PREMISE: Matching-based Prediction for Accurate Review Recommendation
Wei Han
Hui Chen
Soujanya Poria
47
0
0
02 May 2025
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Gaozheng Pei
Ke Ma
Yingfei Sun
Qianqian Xu
Q. Huang
DiffM
40
0
0
02 May 2025
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
Daniele Molino
Francesco Di Feola
Linlin Shen
Paolo Soda
V. Guarrasi
MedIm
LM&MA
67
0
0
02 May 2025
Scalability Matters: Overcoming Challenges in InstructGLM with Similarity-Degree-Based Sampling
Scalability Matters: Overcoming Challenges in InstructGLM with Similarity-Degree-Based Sampling
Hyun Lee
Chris Yi
Maminur Islam
B.D.S. Aritra
33
0
0
02 May 2025
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Gabriel Sarch
Balasaravanan Thoravi Kumaravel
Sahithya Ravi
Vibhav Vineet
A. D. Wilson
155
0
0
02 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
L. Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAML
VLM
64
1
0
02 May 2025
When Dynamic Data Selection Meets Data Augmentation
When Dynamic Data Selection Meets Data Augmentation
Steve Yang
Peng Ye
Furao Shen
Dongzhan Zhou
26
0
0
02 May 2025
VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models
VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models
Mohammadreza Teymoorianfard
Shiqing Ma
Amir Houmansadr
WIGM
67
0
0
02 May 2025
Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs
Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs
Hari Chandana Kuchibhotla
Sai Srinivas Kancheti
Abbavaram Gowtham Reddy
Vineeth N. Balasubramanian
45
0
0
02 May 2025
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
Changhe Chen
Quantao Yang
Xiaohao Xu
Nima Fazeli
Olov Andersson
26
0
0
02 May 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Jenq-Neng Hwang
AI4TS
151
0
0
02 May 2025
On the effectiveness of Large Language Models in the mechanical design domain
On the effectiveness of Large Language Models in the mechanical design domain
Daniele Grandi
Fabian Riquelme
24
0
0
02 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
Hao Li
LRM
69
2
0
01 May 2025
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
Lik Hang Kenny Wong
Xueyang Kang
Kaixin Bai
Jianwei Zhang
56
0
0
01 May 2025
Previous
123...567...194195196
Next