Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.01830
Cited By
Learning to Name Classes for Vision and Language Models
4 April 2023
Sarah Parisot
Yongxin Yang
Jingyu Sun
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to Name Classes for Vision and Language Models"
36 / 36 papers shown
Title
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Rinon Gal
Yuval Alaluf
Yuval Atzmon
Or Patashnik
Amit H. Bermano
Gal Chechik
Daniel Cohen-Or
168
1,898
0
02 Aug 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
94
314
0
12 May 2022
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Zhaowei Cai
Gukyeong Kwon
Avinash Ravichandran
Erhan Bas
Zhuowen Tu
Rahul Bhotika
Stefano Soatto
ObjD
MLLM
VLM
62
50
0
12 Apr 2022
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
Yu Du
Fangyun Wei
Zihe Zhang
Miaojing Shi
Yue Gao
Guoqi Li
VPVLM
VLM
81
335
0
28 Mar 2022
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
Kaican Li
Kai Chen
Haoyu Wang
Lanqing Hong
Chao Ye
...
Chunjing Xu
Dit-Yan Yeung
Xiaodan Liang
Zhenguo Li
Hang Xu
86
103
0
15 Mar 2022
Conditional Prompt Learning for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VLM
CLIP
VPVLM
148
1,359
0
10 Mar 2022
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLM
VLM
102
381
0
08 Dec 2021
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
136
1,067
0
07 Dec 2021
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Tengjiao Wang
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
262
452
0
04 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLM
CLIP
214
581
0
02 Dec 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
115
560
0
15 Nov 2021
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
Andreas Fürst
Elisabeth Rumetshofer
Johannes Lehner
Viet-Hung Tran
Fei Tang
...
David P. Kreil
Michael K Kopp
Günter Klambauer
Angela Bitto-Nemling
Sepp Hochreiter
VLM
CLIP
306
104
0
21 Oct 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLM
CLIP
156
458
0
11 Oct 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
283
224
0
24 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
513
2,422
0
02 Sep 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
239
4,004
0
28 Jul 2021
SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving
Jianhua Han
Xiwen Liang
Hang Xu
Kai Chen
Lanqing Hong
...
Chao Ye
Wei Zhang
Zhenguo Li
Xi Liang
Chunjing Xu
61
78
0
21 Jun 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
190
890
0
26 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
600
4,099
0
18 Apr 2021
Factual Probing Is [MASK]: Learning vs. Learning to Recall
Zexuan Zhong
Dan Friedman
Danqi Chen
56
412
0
12 Apr 2021
Towards Open World Object Detection
K. J. Joseph
Salman Khan
Fahad Shahbaz Khan
V. Balasubramanian
ObjD
94
462
0
03 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.0K
29,926
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
469
3,906
0
11 Feb 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li
Percy Liang
252
4,305
0
01 Jan 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
404
1,981
0
31 Dec 2020
Open-Vocabulary Object Detection Using Captions
Alireza Zareian
Kevin Dela Rosa
Derek Hao Hu
Shih-Fu Chang
VLM
ObjD
139
433
0
20 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
684
41,563
0
22 Oct 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick
Hinrich Schütze
132
976
0
15 Sep 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
73
36
0
12 May 2020
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISeg
VLM
107
1,379
0
08 Aug 2019
Recent Advances in Open Set Recognition: A Survey
Chuanxing Geng
Sheng-Jun Huang
Songcan Chen
BDL
ObjD
147
771
0
21 Nov 2018
Learning Visual Features from Large Weakly Supervised Data
Armand Joulin
Laurens van der Maaten
Allan Jabri
Nicolas Vasilache
SSL
109
409
0
06 Nov 2015
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.7K
100,575
0
04 Sep 2014
Describing Textures in the Wild
Mircea Cimpoi
Subhransu Maji
Iasonas Kokkinos
S. Mohamed
Andrea Vedaldi
3DV
151
2,695
0
14 Nov 2013
Fine-Grained Visual Classification of Aircraft
Subhransu Maji
Esa Rahtu
Arno Solin
Matthew Blaschko
Andrea Vedaldi
132
2,272
0
21 Jun 2013
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
165
6,170
0
03 Dec 2012
1