ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Towards Interpreting Visual Information Processing in Vision-Language Models
Towards Interpreting Visual Information Processing in Vision-Language Models
Clement Neo
Luke Ong
Philip Torr
Mor Geva
David M. Krueger
Fazl Barez
129
15
0
09 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
157
102
0
09 Oct 2024
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
Yi Ding
Bolian Li
Ruqi Zhang
MLLM
123
15
0
09 Oct 2024
An Undetectable Watermark for Generative Image Models
An Undetectable Watermark for Generative Image Models
Sam Gunn
Xuandong Zhao
Dawn Song
WIGM
103
20
0
09 Oct 2024
Conformal Prediction: A Data Perspective
Conformal Prediction: A Data Perspective
Xiaofan Zhou
Baiting Chen
Yu Gui
Lu Cheng
993
5
0
09 Oct 2024
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Zhe Li
Weihao Yuan
Yisheng He
Lingteng Qiu
Shenhao Zhu
Xiaodong Gu
Weichao Shen
Yuan Dong
Zilong Dong
Laurence T. Yang
82
10
0
09 Oct 2024
Aria: An Open Multimodal Native Mixture-of-Experts Model
Aria: An Open Multimodal Native Mixture-of-Experts Model
Dongxu Li
Yudong Liu
Haoning Wu
Yue Wang
Zhiqi Shen
...
Lihuan Zhang
Hanshu Yan
Guoyin Wang
Bei Chen
Junnan Li
MoE
112
65
0
08 Oct 2024
CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection
CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection
Mingyi Guo
Yuyang Liu
Zongying Lin
Peixi Peng
Yonghong Tian
Yonghong Tian
VLM
111
0
0
08 Oct 2024
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Muhammad Jehanzeb Mirza
Mengjie Zhao
Zhuoyuan Mao
Sivan Doveh
Wei Lin
...
Yuki Mitsufuji
Horst Possegger
Rogerio Feris
Leonid Karlinsky
James Glass
VLM
199
1
0
08 Oct 2024
Training-free Diffusion Model Alignment with Sampling Demons
Training-free Diffusion Model Alignment with Sampling Demons
Po-Hung Yeh
Kuang-Huei Lee
Jun-Cheng Chen
86
9
0
08 Oct 2024
FLOPS: Forward Learning with OPtimal Sampling
FLOPS: Forward Learning with OPtimal Sampling
Tao Ren
Zishi Zhang
Jinyang Jiang
Guanghao Li
Zeliang Zhang
Mingqian Feng
Yijie Peng
132
1
0
08 Oct 2024
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon
M. Lee
Sangdon Park
Dongwoo Kim
78
3
0
08 Oct 2024
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon
Jong Chul Ye
DiffM
104
5
0
08 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
156
86
0
08 Oct 2024
HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
Ayano Hiranaka
Shang-Fu Chen
Chieh-Hsin Lai
Dongjun Kim
Naoki Murata
Takashi Shibuya
Wei-Hsiang Liao
Shao-Hua Sun
Yuki Mitsufuji
92
2
0
07 Oct 2024
Organizing Unstructured Image Collections using Natural Language
Organizing Unstructured Image Collections using Natural Language
Mingxuan Liu
Zhun Zhong
Jun Li
Gianni Franchi
Subhankar Roy
Elisa Ricci
VLM
129
5
0
07 Oct 2024
Generating CAD Code with Vision-Language Models for 3D Designs
Generating CAD Code with Vision-Language Models for 3D Designs
Kamel Alrashedy
Pradyumna Tambwekar
Z. Zaidi
Megan Langwasser
Wei Xu
Matthew Gombolay
87
13
0
07 Oct 2024
Low-Rank Continual Personalization of Diffusion Models
Low-Rank Continual Personalization of Diffusion Models
Łukasz Staniszewski
Katarzyna Zaleska
Kamil Deja
DiffM
93
0
0
07 Oct 2024
On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning
On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning
Yongyi Su
Yushu Li
Nanqing Liu
Kui Jia
Xulei Yang
Chuan-Sheng Foo
Xun Xu
TTAAAML
115
1
0
07 Oct 2024
MROSS: Multi-Round Region-based Optimization for Scene Sketching
MROSS: Multi-Round Region-based Optimization for Scene Sketching
Yiqi Liang
Ying Liu
Dandan Long
Ruihui Li
67
0
0
05 Oct 2024
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
Keda Tao
Jinjin Gu
Yulun Zhang
Xiucheng Wang
Nan Cheng
DiffM
105
4
0
05 Oct 2024
Generalizable Prompt Tuning for Vision-Language Models
Generalizable Prompt Tuning for Vision-Language Models
Qian Zhang
VLMVPVLM
114
0
0
04 Oct 2024
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Sicheng Yu
Chengkai Jin
Huanyu Wang
Zhenghao Chen
Sheng Jin
...
Zhenbang Sun
Bingni Zhang
Jiawei Wu
Hao Zhang
Qianru Sun
127
9
0
04 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
179
37
0
04 Oct 2024
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding with LLMs
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding with LLMs
Wei Wu
Chao Wang
L. Chen
Mingze Yin
Yiheng Zhu
Kun Fu
Jieping Ye
Hui Xiong
Zheng Wang
117
1
0
04 Oct 2024
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
Yue Zhang
Zhiyang Xu
Ying Shen
Parisa Kordjamshidi
Lifu Huang
120
8
0
04 Oct 2024
SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data
SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data
Liqian Zhang
Magdalena Fuentes
DiffMVGen
161
3
0
04 Oct 2024
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Xin Zou
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Kening Zheng
...
Junkai Chen
Peijie Jiang
Qingbin Liu
Chang Tang
Xuming Hu
148
8
0
04 Oct 2024
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Nick Jiang
Anish Kachinthaya
Suzie Petryk
Yossi Gandelsman
VLM
93
28
0
03 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Pin-Yu Chen
Xiaodong Cui
Meng Wang
LRM
75
6
0
03 Oct 2024
Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models
Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models
Shuoyuan Wang
Yixuan Li
Hongxin Wei
VLM
128
2
0
03 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive Models
ControlAR: Controllable Image Generation with Autoregressive Models
Zongming Li
Tianheng Cheng
Shoufa Chen
Peize Sun
Haocheng Shen
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
DiffM
211
19
0
03 Oct 2024
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Yuqing Wang
Tianwei Xiong
Daquan Zhou
Zhijie Lin
Yang Zhao
Bingyi Kang
Jiashi Feng
Xihui Liu
VGen
148
35
0
03 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
90
1
0
03 Oct 2024
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Kaizhi Zheng
Xiaotong Chen
Xuehai He
Jing Gu
Linjie Li
Zhengyuan Yang
Kevin Qinghong Lin
Jianfeng Wang
Lijuan Wang
Xin Eric Wang
KELMDiffM
92
0
0
03 Oct 2024
Edge-preserving noise for diffusion models
Edge-preserving noise for diffusion models
Jente Vandersanden
Sascha Holl
Xingchang Huang
Gurprit Singh
DiffM
99
2
0
02 Oct 2024
Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker
Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker
Xinlong Hou
Sen Shen
Xueshen Li
Xinran Gao
Ziyi Huang
Steven J. Holiday
Matthew R. Cribbet
Susan W. White
Edward Sazonov
Yu Gan
103
0
0
02 Oct 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Yao Teng
Han Shi
Xian Liu
Xuefei Ning
Guohao Dai
Yu Wang
Zhenguo Li
Xihui Liu
103
16
0
02 Oct 2024
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
Hong Li
Nanxi Li
Yuanjie Chen
Jianbin Zhu
Qinlu Guo
Cewu Lu
Yong-Lu Li
MLLM
103
1
0
02 Oct 2024
KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models
KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models
Pouyan Navard
Amin Karimi Monsefi
Mengxi Zhou
Wei-Lun Chao
Alper Yilmaz
R. Ramnath
DiffM
118
3
0
02 Oct 2024
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
Tung-Yu Wu
Pei-Yu Lo
ReLMLRM
115
2
0
02 Oct 2024
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Saurav Jha
Shiqi Yang
Masato Ishii
Mengjie Zhao
Christian Simon
Muhammad Jehanzeb Mirza
Dong Gong
Lina Yao
Shusuke Takahashi
Yuki Mitsufuji
DiffM
139
3
0
01 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
166
0
0
01 Oct 2024
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Qiaojun Yu
Siyuan Huang
Xibin Yuan
Zhengkai Jiang
Ce Hao
...
Junbo Wang
Liu Liu
Hongsheng Li
Peng Gao
Cewu Lu
117
3
0
30 Sep 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
114
9
0
30 Sep 2024
Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization
Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization
Jingjing Chen
Hongjie Fang
Hao-Shu Fang
Cewu Lu
76
2
0
30 Sep 2024
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
Yangtao Chen
Zixuan Chen
Junhui Yin
Jing Huo
Pinzhuo Tian
Jieqi Shi
Yang Gao
LM&Ro
118
3
0
30 Sep 2024
Replace Anyone in Videos
Replace Anyone in Videos
Xiang Wang
Shiwei Zhang
Haonan Qiu
Ruihang Chu
Zekun Li
Yuanxing Zhang
Changxin Gao
Yuehuan Wang
Chunhua Shen
Nong Sang
VGenDiffM
116
1
0
30 Sep 2024
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Jihai Zhang
Xiaoye Qu
Tong Zhu
Yu Cheng
89
9
0
28 Sep 2024
MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation
MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation
Taha Koleilat
Hojat Asgariandehkordi
H. Rivaz
Yiming Xiao
MedImVLM
87
9
0
28 Sep 2024
Previous
123...212223...333435
Next