Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (29177★)
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 1,722 papers shown
Title
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
Jungbin Cho
Junwan Kim
Jisoo Kim
Minseo Kim
Mingu Kang
S. Hong
Tae-Hyun Oh
Youngjae Yu
VGen
154
1
0
29 Nov 2024
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
Weixin Mao
Weiheng Zhong
Zhou Jiang
Dong Fang
Zhongyue Zhang
...
Fan Jia
Tiancai Wang
Haoqiang Fan
Osamu Yoshie
Osamu Yoshie
216
7
0
29 Nov 2024
On Domain-Adaptive Post-Training for Multimodal Large Language Models
Daixuan Cheng
Shaohan Huang
Ziyu Zhu
Xintong Zhang
Wayne Xin Zhao
Zhongzhi Luan
Bo Dai
Zhenliang Zhang
VLM
159
5
0
29 Nov 2024
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
Kaustubh Ponkshe
Raghav Singhal
Eduard A. Gorbunov
Alexey Tumanov
Samuel Horváth
Praneeth Vepakomma
254
7
0
29 Nov 2024
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
Yilong Wang
Zilin Gao
Qilong Wang
Zhaofeng Chen
P. Li
Q. Hu
151
1
0
28 Nov 2024
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou
Jiachun Jin
Chang Liu
Ye Ma
Jian Jia
Quan Chen
Peng Jiang
Zhijie Deng
Zhijie Deng
DiffM
VGen
VLM
222
12
0
28 Nov 2024
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
Mohamed Fazli Mohamed Imam
Rufael Fedaku Marew
Jameel Hassan
Mustansar Fiaz
Alham Fikri Aji
Hisham Cholakkal
VLM
517
1
0
28 Nov 2024
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou
Symeon Papadopoulos
I. Kompatsiaris
Efstratios Gavves
152
1
0
28 Nov 2024
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
Haijie Li
Y. Wu
Jiarui Meng
Qiankun Gao
Zhiyao Zhang
Ronggang Wang
Jian Zhang
ISeg
132
4
0
28 Nov 2024
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
Chancharik Mitra
Brandon Huang
Tianning Chai
Zhiqiu Lin
Assaf Arbelle
Rogerio Feris
Leonid Karlinsky
Trevor Darrell
Deva Ramanan
Roei Herzig
VLM
340
4
0
28 Nov 2024
BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis
Seong-Eun Hong
Soobin Lim
Juyeong Hwang
Minwook Chang
Hyeongyeop Kang
166
1
0
28 Nov 2024
Type-R: Automatically Retouching Typos for Text-to-Image Generation
Wataru Shimoda
Naoto Inoue
Daichi Haraguchi
Hayato Mitani
S. Uchida
Kota Yamaguchi
DiffM
195
0
0
27 Nov 2024
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal
Xiang Yue
Erion Plaku
Ziyu Yao
LRM
181
1
0
27 Nov 2024
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li
Zhengkang Xiang
Joseph West
Kourosh Khoshelham
ObjD
VLM
172
1
0
27 Nov 2024
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
Trong-Thuan Nguyen
Pha Nguyen
J. Cothren
Alper Yilmaz
Khoa Luu
153
1
0
27 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
165
10
0
27 Nov 2024
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu
Jiajian Li
Yichen Jiang
Niranjan Sujay
Zhiyong Yang
Juexiao Zhang
John Abanes
Jing Zhang
Chen Feng
158
4
0
26 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
184
2
0
26 Nov 2024
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
Shuyu Yang
Yaxiong Wang
Li Zhu
Zhedong Zheng
179
7
0
26 Nov 2024
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia
Jishuo Li
Zhiwei Lin
Xinhao Wang
Yansen Wang
Ming-Hsuan Yang
VLM
138
3
0
26 Nov 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
185
2
0
26 Nov 2024
Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions
Nicolai Hermann
Jorge Condor
Piotr Didyk
3DV
149
0
0
26 Nov 2024
Integrating Dual Prototypes for Task-Wise Adaption in Pre-Trained Model-Based Class-Incremental Learning
Zhiming Xu
Steve Yang
Baile Xu
Jian Zhao
Furao Shen
CLL
176
0
0
26 Nov 2024
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan
Nithin Gopalakrishnan Nair
Jay N. Paranjape
Vishal M. Patel
DiffM
151
1
0
26 Nov 2024
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim
Dayun Ju
Woojung Han
Ming-Hsuan Yang
Seong Jae Hwang
VLM
VOS
257
1
0
26 Nov 2024
Task Singular Vectors: Reducing Task Interference in Model Merging
Antonio Andrea Gargiulo
Donato Crisostomi
Maria Sofia Bucarelli
Simone Scardapane
Fabrizio Silvestri
Emanuele Rodolà
MoMe
147
16
0
26 Nov 2024
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
Zhichao Zhang
Wei Sun
Xinyue Li
Yunhao Li
Qihang Ge
...
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
EGVM
237
1
0
25 Nov 2024
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
Sizai Hou
Songze Li
Duanyi Yao
AAML
192
0
0
25 Nov 2024
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen
Siyuan Liang
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Qichuan Geng
Xiaochun Cao
FAtt
189
4
0
25 Nov 2024
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis
Bo Liu
K. Zou
Liming Zhan
Zexin Lu
Xiaoyu Dong
Yidi Chen
Chengqiang Xie
Jiannong Cao
Xiao-Ming Wu
Huazhu Fu
184
2
0
25 Nov 2024
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
Yongwei Chen
Yushi Lan
Shangchen Zhou
Tengfei Wang
Xingang Pan
212
6
0
25 Nov 2024
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang
Yihua Zhang
Ruizheng Bai
Yue Zhao
Sijia Liu
Zhuowen Tu
AAML
PICV
157
2
0
25 Nov 2024
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Linqing Zhong
Chen Gao
Zihan Ding
Yue Liao
Si Liu
Shifeng Zhang
Xu Zhou
Si Liu
LRM
163
7
0
25 Nov 2024
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go
Byeongjun Park
Jiho Jang
Jin-Young Kim
Soonwoo Kwon
Changick Kim
3DGS
203
3
0
25 Nov 2024
Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering
Zhicheng Zhao
Changfu Zhou
Yu Zhang
Chenglong Li
Xiaoliang Ma
Jin Tang
120
0
0
24 Nov 2024
PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Ziyao Zeng
Jingcheng Ni
Daniel Wang
Patrick Rim
Younjoon Chung
Fengyu Yang
Byung-Woo Hong
A. Wong
DiffM
MDE
256
2
0
24 Nov 2024
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
Jiaqi Wang
Yifei Gao
Jitao Sang
MLLM
181
2
0
24 Nov 2024
Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors
Soumava Paul
Prakhar Kaushik
Alan Yuille
3DGS
DiffM
516
0
0
24 Nov 2024
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Sule Bai
Yong-Jin Liu
Yifei Han
Haoji Zhang
Yansong Tang
VLM
295
8
0
24 Nov 2024
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs
Teng Zhou
Xiaoyu Zhang
Yongchuan Tang
MLLM
DiffM
187
1
0
24 Nov 2024
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu
Wei Chow
Zhongqi Yue
Kaihang Pan
Yang Wu
Xiaoyang Wan
Juncheng Billy Li
Siliang Tang
Hao Zhang
Yueting Zhuang
DiffM
204
29
0
24 Nov 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
154
7
0
23 Nov 2024
MUNBa: Machine Unlearning via Nash Bargaining
Jing Wu
Mehrtash Harandi
MU
139
5
0
23 Nov 2024
Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
Anubhav Jain
Yuya Kobayashi
Takashi Shibuya
Yuhta Takida
N. Memon
Julian Togelius
Yuki Mitsufuji
DiffM
187
2
0
23 Nov 2024
LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces
Anxhelo Diko
Antonino Furnari
Luigi Cinque
G. Farinella
326
0
0
23 Nov 2024
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
Datao Tang
Xiangyong Cao
Xuan Wu
Jialin Li
Jing Yao
Xueru Bai
Deyu Meng
Yin Li
Deyu Meng
DiffM
141
8
0
23 Nov 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
156
5
0
23 Nov 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
196
3
0
22 Nov 2024
Adversarial Prompt Distillation for Vision-Language Models
Lin Luo
Xin Wang
Bojia Zi
Shihao Zhao
Xingjun Ma
Yu-Gang Jiang
AAML
VLM
150
4
0
22 Nov 2024
RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency
Wentao Huang
Meilong Xu
Xiaoling Hu
Shahira Abousamra
Aniruddha Ganguly
...
Prateek Prasanna
Tahsin M. Kurc
Joel H. Saltz
Michael L. Miller
Chong Chen
118
0
0
22 Nov 2024
Previous
1
2
3
...
17
18
19
...
33
34
35
Next