ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Text-to-Image Rectified Flow as Plug-and-Play Priors
Text-to-Image Rectified Flow as Plug-and-Play Priors
Xiaofeng Yang
Cheng Chen
Xulei Yang
Fayao Liu
Guosheng Lin
DiffM
114
7
0
21 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
136
9
0
21 Feb 2025
Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
Boyu Mi
Hanqing Wang
Tai Wang
Yilun Chen
Jiangmiao Pang
110
0
0
21 Feb 2025
3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
Hansheng Chen
Bokui Shen
Yulin Liu
Ruoxi Shi
Linqi Zhou
Connor Z. Lin
Jiayuan Gu
H. Su
Gordon Wetzstein
Leonidas Guibas
170
4
0
21 Feb 2025
Robust Concept Erasure Using Task Vectors
Robust Concept Erasure Using Task Vectors
Minh Pham
Kelly O. Marshall
Chinmay Hegde
Niv Cohen
177
20
0
21 Feb 2025
Deep learning based infrared small object segmentation: Challenges and future directions
Deep learning based infrared small object segmentation: Challenges and future directions
Zhengeng Yang
Hongshan Yu
Jianjun Zhang
Qiang Tang
Ajmal Mian
209
3
0
21 Feb 2025
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
D. She
Mushui Liu
Jingxuan Pang
Jin Wang
Zhen Yang
...
Yi Wang
Qihan Huang
Haobin Tang
YunLong Yu
Siming Fu
VGen
208
5
0
21 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
193
3
0
21 Feb 2025
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Yongqi Dong
Xingmin Lu
Ruohan Li
Wei Song
B. Arem
Haneen Farah
ViT
169
1
0
21 Feb 2025
Neural Attention Search
Neural Attention Search
Difan Deng
Marius Lindauer
137
0
0
21 Feb 2025
X-IL: Exploring the Design Space of Imitation Learning Policies
X-IL: Exploring the Design Space of Imitation Learning Policies
Xiaogang Jia
Atalay Donat
Xi Huang
Xuan Zhao
Denis Blessing
...
Han A. Wang
Hanyi Zhang
Qian Wang
Rudolf Lioutikov
Gerhard Neumann
141
1
0
20 Feb 2025
LaVCa: LLM-assisted Visual Cortex Captioning
LaVCa: LLM-assisted Visual Cortex Captioning
Takuya Matsuyama
Shinji Nishimoto
Yu Takagi
114
1
0
20 Feb 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Yue Yang
Ajay Patel
Matt Deitke
Tanmay Gupta
Luca Weihs
...
Mark Yatskar
Chris Callison-Burch
Ranjay Krishna
Aniruddha Kembhavi
Christopher Clark
SyDa
186
3
0
20 Feb 2025
CrossOver: 3D Scene Cross-Modal Alignment
CrossOver: 3D Scene Cross-Modal Alignment
S. Sarkar
O. Mikšík
Marc Pollefeys
Daniel Barath
Iro Armeni
3DPC
142
2
0
20 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
175
0
0
20 Feb 2025
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
137
0
0
20 Feb 2025
Contrastive Localized Language-Image Pre-Training
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
Hao Zhang
Xiang Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Yue Yang
Zhe Gan
CLIPVLM
117
10
0
20 Feb 2025
Myna: Masking-Based Contrastive Learning of Musical Representations
Myna: Masking-Based Contrastive Learning of Musical Representations
Ori Yonay
Tracy Hammond
Tianbao Yang
AAML
214
0
0
20 Feb 2025
FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
Yunlong Yuan
Yuanfan Guo
Chunwei Wang
Wei Zhang
Hang Xu
L. Zhang
DiffMVGen
189
3
0
20 Feb 2025
A Template Is All You Meme
A Template Is All You Meme
Luke Bates
Peter Ebert Christensen
Preslav Nakov
Iryna Gurevych
VLM
117
1
0
20 Feb 2025
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Luca Barsellotti
Roberto Bigazzi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
207
1
0
20 Feb 2025
Controllable Unlearning for Image-to-Image Generative Models via $\varepsilon$-Constrained Optimization
Controllable Unlearning for Image-to-Image Generative Models via ε\varepsilonε-Constrained Optimization
Xiaohua Feng
Chao-Jun Chen
Yuyuan Li
Lulu Zhang
Longfei Li
Jun Zhou
Xiaolin Zheng
MU
125
0
0
20 Feb 2025
Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
Yizhuo Lu
Changde Du
Chong Wang
Xuanliu Zhu
Liuyun Jiang
Xujin Li
Huiguang He
VGen
222
4
0
20 Feb 2025
Towards Fusing Point Cloud and Visual Representations for Imitation Learning
Towards Fusing Point Cloud and Visual Representations for Imitation Learning
Atalay Donat
Xiaogang Jia
Xi Huang
Aleksandar Taranovic
Denis Blessing
Ge Li
Hongyi Zhou
Hanyi Zhang
Rudolf Lioutikov
Gerhard Neumann
3DPCSSL
135
1
0
20 Feb 2025
DiffGuard: Text-Based Safety Checker for Diffusion Models
DiffGuard: Text-Based Safety Checker for Diffusion Models
Massine El Khader
Elias Al Bouzidi
Abdellah Oumida
Mohammed Sbaihi
Eliott Binard
Jean-Philippe Poli
Wassila Ouerdane
Boussad Addad
Katarzyna Kapusta
DiffM
190
0
0
20 Feb 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
Shengguang Wu
Fan-Yun Sun
Kaiyue Wen
Nick Haber
VLM
132
3
0
19 Feb 2025
Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models
Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models
Peter Carragher
Abhinand Jha
R Raghav
Kathleen M. Carley
RALM
118
0
0
19 Feb 2025
MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching
MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching
Yen-Siang Wu
Chi-Pin Huang
Fu-En Yang
Yu-Jie Wang
DiffMVGen
111
1
0
18 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
MQALM
174
0
0
18 Feb 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu
Yuvan Sharma
Haoru Xue
Giscard Biamby
Junyi Zhang
Ziteng Ji
Trevor Darrell
Roei Herzig
146
2
0
18 Feb 2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing
Yuping Wang
Peiran Li
Ruizheng Bai
Yansen Wang
Chan-wei Hu
Chengxuan Qian
Huaxiu Yao
Zhengzhong Tu
179
8
0
18 Feb 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
Xinlong Chen
Yuanxing Zhang
Chongling Rao
Yushuo Guan
Qingbin Liu
Fuzheng Zhang
Chengru Song
Qiang Liu
Di Zhang
Tieniu Tan
73
2
0
18 Feb 2025
Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios
Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios
Liangqi Lei
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
WIGM
104
0
0
18 Feb 2025
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
Kung-Hsiang Huang
Can Qin
Haoyi Qiu
Philippe Laban
Shafiq Joty
Caiming Xiong
Chien-Sheng Wu
VLM
299
5
0
17 Feb 2025
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding
Pingping Zhang
Jinlong Li
Kecheng Chen
Meng Wang
Long Xu
Haoliang Li
N. Sebe
Sam Kwong
Shiqi Wang
VGen
161
3
0
17 Feb 2025
On the Statistical Complexity of Estimating Vendi Scores from Empirical Data
On the Statistical Complexity of Estimating Vendi Scores from Empirical Data
Azim Ospanov
Farzan Farnia
165
3
0
17 Feb 2025
SuperMerge: An Approach For Gradient-Based Model Merging
SuperMerge: An Approach For Gradient-Based Model Merging
Haoyu Yang
Zheng Zhang
Saket Sathe
MoMe
206
0
0
17 Feb 2025
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Zichen Wen
Yifeng Gao
Weijia Li
Conghui He
Linfeng Zhang
LRM
132
6
0
17 Feb 2025
Simplifying DINO via Coding Rate Regularization
Simplifying DINO via Coding Rate Regularization
Ziyang Wu
Jingyuan Zhang
Druv Pai
Xinze Wang
Chandan Singh
Jianwei Yang
Jianfeng Gao
Yi-An Ma
509
1
0
17 Feb 2025
Object-Centric Image to Video Generation with Language Guidance
Object-Centric Image to Video Generation with Language Guidance
Angel Villar-Corrales
Gjergj Plepi
Sven Behnke
DiffMVGenOCL
235
1
0
17 Feb 2025
Conformal Prediction Sets Can Cause Disparate Impact
Conformal Prediction Sets Can Cause Disparate Impact
Jesse C. Cresswell
Bhargava Kumar
Yi Sui
Mouloud Belbahri
FaML
521
2
0
17 Feb 2025
Hedge Fund Portfolio Construction Using PolyModel Theory and iTransformer
Hedge Fund Portfolio Construction Using PolyModel Theory and iTransformer
Siqiao Zhao
Zhikang Dong
Zeyu Cao
Raphael Douady
106
6
0
17 Feb 2025
Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey
Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey
Ruiyao Xu
Kaize Ding
117
6
0
17 Feb 2025
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang
Bowen Jin
Jiacheng Shen
Sirui Ding
Qiaoyu Tan
Jiawei Han
182
2
0
17 Feb 2025
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering
Yanpeng Zhao
Yiwei Hao
Siyu Gao
Yunbo Wang
Xiaokang Yang
OCL
253
1
0
17 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
226
0
0
17 Feb 2025
Phantom: Subject-consistent video generation via cross-modal alignment
Phantom: Subject-consistent video generation via cross-modal alignment
Lijie Liu
Tianxiang Ma
Bingchuan Li
Zhuowei Chen
Jiawei Liu
Qian He
Xinglong Wu
Qian He
Xinglong Wu
DiffMVGen
162
14
0
16 Feb 2025
ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations
ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations
Bowen Jiang
Yuan Yuan
Xinyi Bai
Zhuoqun Hao
Alyson Yin
Yaojie Hu
Wenyu Liao
Lyle Ungar
Camillo J Taylor
DiffM
110
2
0
16 Feb 2025
Semantics-aware Test-time Adaptation for 3D Human Pose Estimation
Semantics-aware Test-time Adaptation for 3D Human Pose Estimation
Qiuxia Lin
Rongyu Chen
Kerui Gu
Angela Yao
3DHTTA
142
0
0
15 Feb 2025
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
Ruoxuan Feng
Jiangyu Hu
Wenke Xia
Tianci Gao
Ao Shen
Yuhao Sun
Bin Fang
Di Hu
93
9
0
15 Feb 2025
Previous
123...101112...333435
Next