Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.02605
Cited By
Detecting Twenty-thousand Classes using Image-level Supervision
7 January 2022
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Detecting Twenty-thousand Classes using Image-level Supervision"
50 / 128 papers shown
Title
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
48
83
0
06 Dec 2023
Learning Generalizable Manipulation Policies with Object-Centric 3D Representations
Yifeng Zhu
Zhenyu Jiang
Peter Stone
Yuke Zhu
3DPC
24
43
0
22 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
26
33
0
20 Oct 2023
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
Shashanka Venkataramanan
Mamshad Nayeem Rizve
João Carreira
Yuki M. Asano
Yannis Avrithis
SSL
29
18
0
12 Oct 2023
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
Wen-Hsuan Chu
Adam W. Harley
P. Tokmakov
Achal Dave
Leonidas J. Guibas
Katerina Fragkiadaki
VLM
28
7
0
10 Oct 2023
FLIP: Cross-domain Face Anti-spoofing with Language Guidance
K. Srivatsan
Muzammal Naseer
Karthik Nandakumar
CVBM
47
44
0
28 Sep 2023
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
Jiahao Xie
Wei Li
Xiangtai Li
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
DiffM
VLM
69
35
0
22 Sep 2023
Detect Everything with Few Examples
Xinyu Zhang
Yuting Wang
Abdeslam Boularias
ObjD
VLM
29
13
0
22 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
25
9
0
05 Sep 2023
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
ISeg
3DPC
VLM
52
46
0
01 Sep 2023
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
40
19
0
28 Aug 2023
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
36
0
0
24 Aug 2023
Structured World Models from Human Videos
Russell Mendonca
Shikhar Bahl
Deepak Pathak
LM&Ro
41
86
0
21 Aug 2023
An Examination of the Compositionality of Large Generative Vision-Language Models
Teli Ma
Rong Li
Junwei Liang
CoGe
34
2
0
21 Aug 2023
ARGUS: Visualization of AI-Assisted Task Guidance in AR
Sonia Castelo
Joao Rulff
Erin McGowan
Bea Steers
Guande Wu
...
Qinghong Sun
Huy Q. Vo
J. P. Bello
M. Krone
Claudio Silva
34
18
0
11 Aug 2023
Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots
Yoshiki Obinata
Naoaki Kanazawa
Kento Kawaharazuka
Iori Yanokura
Soon-Hyeob Kim
K. Okada
Masayuki Inaba
LM&Ro
19
7
0
07 Aug 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
85
224
0
07 Jul 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
35
14
0
20 Jun 2023
HomeRobot: Open-Vocabulary Mobile Manipulation
Sriram Yenamandra
A. Ramachandran
Karmesh Yadav
Austin S. Wang
Mukul Khanna
...
Devendra Singh Chaplot
Dhruv Batra
Roozbeh Mottaghi
Yonatan Bisk
Chris Paxton
LM&Ro
44
79
0
20 Jun 2023
GeneCIS: A Benchmark for General Conditional Image Similarity
S. Vaze
Nicolas Carion
Ishan Misra
VLM
DiffM
29
26
0
13 Jun 2023
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
30
30
0
30 May 2023
Contextual Object Detection with Multimodal Large Language Models
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
32
78
0
29 May 2023
VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations
Nikolaos Tsagkas
Oisin Mac Aodha
Chris Xiaoxuan Lu
VLM
27
25
0
21 May 2023
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Zhangxuan Gu
Zhuoer Xu
Haoxing Chen
Jun Lan
Changhua Meng
Weiqiang Wang
19
4
0
16 May 2023
IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level Grounding of Images
Varuna Krishna
S. Suryavardan
Shreyash Mishra
Sathyanarayanan Ramamoorthy
Parth Patwa
Megha Chakraborty
Aman Chadha
Amitava Das
Amit P. Sheth
VLM
25
3
0
12 May 2023
Affordances from Human Videos as a Versatile Representation for Robotics
Shikhar Bahl
Russell Mendonca
Lili Chen
Unnat Jain
Deepak Pathak
41
164
0
17 Apr 2023
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Shuhuai Ren
Aston Zhang
Yi Zhu
Shuai Zhang
Shuai Zheng
Mu Li
Alexander J. Smola
Xu Sun
VPVLM
VLM
21
28
0
10 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
47
74
0
10 Apr 2023
V3Det: Vast Vocabulary Visual Detection Dataset
Jiaqi Wang
Pan Zhang
Tao Chu
Yuhang Cao
Yujie Zhou
Tong Wu
Bin Wang
Conghui He
Dahua Lin
VLM
ObjD
29
52
0
07 Apr 2023
Navigating to Objects Specified by Images
Jacob Krantz
Théophile Gervet
Karmesh Yadav
Austin S. Wang
Chris Paxton
Roozbeh Mottaghi
Dhruv Batra
Jitendra Malik
Stefan Lee
Devendra Singh Chaplot
44
36
0
03 Apr 2023
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang
Runyu Ding
Weipeng Deng
Zhe Wang
Xiaojuan Qi
20
62
0
03 Apr 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
29
23
0
29 Mar 2023
ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
Yongqi An
Xu Zhao
Tao Yu
Haiyun Guo
Chaoyang Zhao
Ming Tang
Jinqiao Wang
40
20
0
26 Mar 2023
Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection
Hwanjun Song
Jihwan Bang
VLM
ObjD
29
14
0
25 Mar 2023
Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
Anurag Ghosh
Dinesh Reddy Narapureddy
Christoph Mertz
S. Narasimhan
28
4
0
25 Mar 2023
Three ways to improve feature alignment for open vocabulary detection
Relja Arandjelović
A. Andonian
A. Mensch
Olivier J. Hénaff
Jean-Baptiste Alayrac
Andrew Zisserman
VLM
ObjD
33
19
0
23 Mar 2023
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection
Arushi Rai
Adriana Kovashka
27
0
0
16 Mar 2023
Zero-Shot Object Searching Using Large-scale Object Relationship Prior
Hongyi Chen
Ruinian Xu
Shuo Cheng
Patricio A. Vela
Danfei Xu
LM&Ro
26
5
0
10 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
89
1,820
0
09 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
12
31
0
04 Mar 2023
ALAN: Autonomously Exploring Robotic Agents in the Real World
Russell Mendonca
Shikhar Bahl
Deepak Pathak
LM&Ro
33
20
0
13 Feb 2023
Zero-shot Image-to-Image Translation
Gaurav Parmar
Krishna Kumar Singh
Richard Y. Zhang
Yijun Li
Jingwan Lu
Jun-Yan Zhu
DiffM
24
431
0
06 Feb 2023
OvarNet: Towards Open-vocabulary Object Attribute Recognition
Keyan Chen
Xiaolong Jiang
Yao Hu
Xu Tang
Yan Gao
Jianqi Chen
Weidi Xie
VLM
ObjD
37
40
0
23 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
24
4
0
05 Jan 2023
PACO: Parts and Attributes of Common Objects
Vignesh Ramanathan
Anmol Kalia
Vladan Petrovic
Yiqian Wen
Baixue Zheng
...
Abhishek Kadian
Amir Mousavi
Yi-Zhe Song
Abhimanyu Dubey
D. Mahajan
VLM
19
94
0
04 Jan 2023
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
Jianzong Wu
Xiangtai Li
Henghui Ding
Xia Li
Guangliang Cheng
Yu Tong
Chen Change Loy
VLM
85
31
0
02 Jan 2023
Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection
Sanghyun Woo
Kwanyong Park
Seoung Wug Oh
In So Kweon
Joon-Young Lee
VLM
VOS
28
6
0
20 Dec 2022
VideoDex: Learning Dexterity from Internet Videos
Kenneth Shaw
Shikhar Bahl
Deepak Pathak
19
89
0
08 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
F. Khan
CLIP
VLM
31
148
0
06 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
24
2
0
06 Dec 2022
Previous
1
2
3
Next