ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.06230
  4. Cited By
Simple Open-Vocabulary Object Detection with Vision Transformers

Simple Open-Vocabulary Object Detection with Vision Transformers

12 May 2022
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
Alexey Dosovitskiy
Aravindh Mahendran
Anurag Arnab
Mostafa Dehghani
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
    ObjD
    CLIP
    VLM
    ViT
    OCL
ArXivPDFHTML

Papers citing "Simple Open-Vocabulary Object Detection with Vision Transformers"

47 / 247 papers shown
Title
Three ways to improve feature alignment for open vocabulary detection
Three ways to improve feature alignment for open vocabulary detection
Relja Arandjelović
A. Andonian
A. Mensch
Olivier J. Hénaff
Jean-Baptiste Alayrac
Andrew Zisserman
VLM
ObjD
33
19
0
23 Mar 2023
Open-Vocabulary Object Detection using Pseudo Caption Labels
Open-Vocabulary Object Detection using Pseudo Caption Labels
Han-Cheol Cho
Won Young Jhoo
Woohyun Kang
Byungseok Roh
VLM
ObjD
29
20
0
23 Mar 2023
Detecting Everything in the Open World: Towards Universal Object
  Detection
Detecting Everything in the Open World: Towards Universal Object Detection
Zhenyu Wang
Yali Li
Xi Chen
Ser-Nam Lim
Antonio Torralba
Hengshuang Zhao
Shengjin Wang
ObjD
VLM
32
77
0
21 Mar 2023
LERF: Language Embedded Radiance Fields
LERF: Language Embedded Radiance Fields
J. Kerr
C. Kim
Ken Goldberg
Angjoo Kanazawa
Matthew Tancik
15
349
0
16 Mar 2023
Unified Visual Relationship Detection with Vision and Language Models
Unified Visual Relationship Detection with Vision and Language Models
Long Zhao
Liangzhe Yuan
Boqing Gong
Huayu Chen
Florian Schroff
Ming Yang
Hartwig Adam
Ting Liu
ObjD
32
9
0
16 Mar 2023
Query-guided Attention in Vision Transformers for Localizing Objects
  Using a Single Sketch
Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch
Aditay Tripathi
Anand Mishra
Anirban Chakraborty
20
1
0
15 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
45
431
0
14 Mar 2023
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Luting Wang
Yi Liu
Penghui Du
Zihan Ding
Yue Liao
Qiaosong Qi
Biaolong Chen
Si Liu
ObjD
VLM
70
62
0
10 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
89
1,820
0
09 Mar 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion
  Models
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
223
320
0
08 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
156
145
0
02 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for
  Embodied Agents
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
24
42
0
01 Mar 2023
Semantic Mechanical Search with Large Vision and Language Models
Semantic Mechanical Search with Large Vision and Language Models
Satvik Sharma
Huang Huang
K. Shivakumar
A. Imran
Ryan Hoque
Brian Ichter
Ken Goldberg
LM&Ro
VLM
29
5
0
24 Feb 2023
Scaling Robot Learning with Semantically Imagined Experience
Scaling Robot Learning with Semantically Imagined Experience
Tianhe Yu
Ted Xiao
Austin Stone
Jonathan Tompson
Anthony Brohan
...
M. Dee
Jodilyn Peralta
Brian Ichter
Karol Hausman
F. Xia
LM&Ro
DiffM
36
144
0
22 Feb 2023
SOCRATES: Text-based Human Search and Approach using a Robot Dog
SOCRATES: Text-based Human Search and Approach using a Robot Dog
Jeongeun Park
Jefferson Silveria
Matthew K. X. J. Pan
Sungjoon Choi
16
0
0
10 Feb 2023
Variational Cross-Graph Reasoning and Adaptive Structured Semantics
  Learning for Compositional Temporal Grounding
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding
Juncheng Li
Siliang Tang
Linchao Zhu
Wenqiao Zhang
Yi Yang
Tat-Seng Chua
Fei Wu
Y. Zhuang
BDL
24
14
0
22 Jan 2023
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Aviad Aberdam
David Bensaid
Alona Golts
Roy Ganz
Oren Nuriel
Royee Tichauer
Shai Mazor
Ron Litman
VLM
CLIP
24
12
0
18 Jan 2023
Learning to Detect and Segment for Open Vocabulary Object Detection
Learning to Detect and Segment for Open Vocabulary Object Detection
Tao Wang
Nan Li
VLM
ObjD
8
25
0
23 Dec 2022
GOOD: Exploring Geometric Cues for Detecting Objects in an Open World
GOOD: Exploring Geometric Cues for Detecting Objects in an Open World
Haiwen Huang
Andreas Geiger
Dan Zhang
VLM
ObjD
21
11
0
22 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Generalized Decoding for Pixel, Image, and Language
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLM
MLLM
ObjD
21
241
0
21 Dec 2022
Benchmarking Spatial Relationships in Text-to-Image Generation
Benchmarking Spatial Relationships in Text-to-Image Generation
Tejas Gokhale
Hamid Palangi
Besmira Nushi
Vibhav Vineet
Eric Horvitz
Ece Kamar
Chitta Baral
Yezhou Yang
EGVM
45
66
0
20 Dec 2022
FlexiViT: One Model for All Patch Sizes
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim M. Alabdulmohsin
Filip Pavetić
VLM
45
90
0
15 Dec 2022
PØDA: Prompt-driven Zero-shot Domain Adaptation
PØDA: Prompt-driven Zero-shot Domain Adaptation
Mohammad Fahes
Tuan-Hung Vu
Andrei Bursuc
Patrick Pérez
Raoul de Charette
VLM
38
45
0
06 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
24
10
0
05 Dec 2022
Soft Labels for Rapid Satellite Object Detection
Soft Labels for Rapid Satellite Object Detection
Matt Ciolino
Grant Rosario
David A. Noever
ObjD
45
0
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph
  Riddles
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
24
11
0
29 Nov 2022
Multi-Modal Few-Shot Temporal Action Detection
Multi-Modal Few-Shot Temporal Action Detection
Sauradip Nag
Mengmeng Xu
Xiatian Zhu
Juan-Manuel Perez-Rua
Guohao Li
Yi-Zhe Song
Tao Xiang
VLM
30
6
0
27 Nov 2022
Learning Object-Language Alignments for Open-Vocabulary Object Detection
Learning Object-Language Alignments for Open-Vocabulary Object Detection
Chuang Lin
Pei Sun
Yi-Xin Jiang
Ping Luo
Lizhen Qu
Gholamreza Haffari
Zehuan Yuan
Jianfei Cai
VLM
ObjD
29
95
0
27 Nov 2022
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLM
CLIP
35
3
0
21 Nov 2022
Unifying Tracking and Image-Video Object Detection
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
21
0
0
20 Nov 2022
Visual Programming: Compositional visual reasoning without training
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
79
402
0
18 Nov 2022
OpenOOD: Benchmarking Generalized Out-of-Distribution Detection
OpenOOD: Benchmarking Generalized Out-of-Distribution Detection
Jingkang Yang
Pengyun Wang
Dejian Zou
Zitang Zhou
Kun Ding
...
Kaiyang Zhou
Wayne Zhang
Dan Hendrycks
Yixuan Li
Ziwei Liu
OODD
28
225
0
13 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
28
335
0
06 Oct 2022
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language
  Models
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Weicheng Kuo
Huayu Chen
Xiuye Gu
A. Piergiovanni
A. Angelova
MLLM
VLM
ObjD
49
134
0
30 Sep 2022
REST: REtrieve & Self-Train for generative action recognition
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
29
4
0
29 Sep 2022
VIPHY: Probing "Visible" Physical Commonsense Knowledge
VIPHY: Probing "Visible" Physical Commonsense Knowledge
Shikhar Singh
Ehsan Qasemi
Muhao Chen
46
6
0
15 Sep 2022
OmDet: Large-scale vision-language multi-dataset pre-training with
  multimodal detection network
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network
Tiancheng Zhao
Peng Liu
Kyusong Lee
VLM
MLLM
ObjD
19
7
0
10 Sep 2022
Visual Recognition by Request
Visual Recognition by Request
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
16
15
0
28 Jul 2022
Open-Vocabulary 3D Detection via Image-level Class and Debiased
  Cross-modal Contrastive Learning
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning
Yuheng Lu
Chenfeng Xu
Xi Wei
Xiaodong Xie
M. Tomizuka
Kurt Keutzer
Shanghang Zhang
3DPC
25
20
0
05 Jul 2022
Contrastive language and vision learning of general fashion concepts
Contrastive language and vision learning of general fashion concepts
P. Chia
Giuseppe Attanasio
Federico Bianchi
Silvia Terragni
A. Magalhães
Diogo Gonçalves
C. Greco
Jacopo Tagliabue
CLIP
21
42
0
08 Apr 2022
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot
  Object Navigation
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
S. Gadre
Mitchell Wortsman
Gabriel Ilharco
Ludwig Schmidt
Shuran Song
CLIP
LM&Ro
35
142
0
20 Mar 2022
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
35
73
0
25 Nov 2021
SCENIC: A JAX Library for Computer Vision Research and Beyond
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
46
68
0
18 Oct 2021
ViDT: An Efficient and Effective Fully Transformer-based Object Detector
ViDT: An Efficient and Effective Fully Transformer-based Object Detector
Hwanjun Song
Deqing Sun
Sanghyuk Chun
Varun Jampani
Dongyoon Han
Byeongho Heo
Wonjae Kim
Ming-Hsuan Yang
87
76
0
08 Oct 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
225
899
0
28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
310
3,708
0
11 Feb 2021
Simple Copy-Paste is a Strong Data Augmentation Method for Instance
  Segmentation
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
Golnaz Ghiasi
Huayu Chen
A. Srinivas
Rui Qian
Nayeon Lee
E. D. Cubuk
Quoc V. Le
Barret Zoph
ISeg
252
969
0
13 Dec 2020
Previous
12345