ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.06230
  4. Cited By
Simple Open-Vocabulary Object Detection with Vision Transformers

Simple Open-Vocabulary Object Detection with Vision Transformers

12 May 2022
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
Alexey Dosovitskiy
Aravindh Mahendran
Anurag Arnab
Mostafa Dehghani
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
    ObjD
    CLIP
    VLM
    ViT
    OCL
ArXivPDFHTML

Papers citing "Simple Open-Vocabulary Object Detection with Vision Transformers"

50 / 247 papers shown
Title
RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving
RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving
H. Zunair
Md. Shakib Khan
A. Ben Hamza
27
6
0
14 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
47
69
0
10 Jan 2024
Multimodal Data Curation via Object Detection and Filter Ensembles
Multimodal Data Curation via Object Detection and Filter Ensembles
Tzu-Heng Huang
Changho Shin
Sui Jiet Tay
Dyah Adila
Frederic Sala
34
5
0
05 Jan 2024
FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D
  Scene Understanding
FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding
Xingxing Zuo
Pouya Samangouei
Yunwen Zhou
Yan Di
Mingyang Li
3DGS
19
46
0
03 Jan 2024
Generating Enhanced Negatives for Training Language-Based Object
  Detectors
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao
Long Zhao
Vijay Kumar B.G
Yumin Suh
Dimitris N. Metaxas
Manmohan Chandraker
S. Schulter
ObjD
VLM
39
5
0
29 Dec 2023
Multiscale Vision Transformers meet Bipartite Matching for efficient
  single-stage Action Localization
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
49
4
0
29 Dec 2023
Independence in the Home: A Wearable Interface for a Person with
  Quadriplegia to Teleoperate a Mobile Manipulator
Independence in the Home: A Wearable Interface for a Person with Quadriplegia to Teleoperate a Mobile Manipulator
Akhil Padmanabha
Janavi Gupta
Chen Chen
Jehan Yang
Vy Nguyen
Douglas J. Weber
Carmel Majidi
Zackory M. Erickson
33
12
0
22 Dec 2023
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
60
122
0
21 Dec 2023
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling
  Vision-Language Models Through Open-Vocabulary Knowledge
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge
Huy Le
Tung Kieu
Anh Nguyen
Ngan Le
VGen
29
1
0
15 Dec 2023
Foundation Models in Robotics: Applications, Challenges, and the Future
Foundation Models in Robotics: Applications, Challenges, and the Future
Roya Firoozi
Johnathan Tucker
Stephen Tian
Anirudha Majumdar
Jiankai Sun
...
Brian Ichter
Danny Driess
Jiajun Wu
Cewu Lu
Mac Schwager
LM&Ro
AI4CE
LRM
VLM
37
140
0
13 Dec 2023
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun
Runjia Li
Philip H. S. Torr
Xiuye Gu
Siyang Li
VLM
CLIP
36
32
0
12 Dec 2023
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for
  Open-Vocabulary Object Detection
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection
Joonhyun Jeong
Geondo Park
Jayeon Yoo
Hyungsik Jung
Heesu Kim
VLM
ObjD
41
10
0
12 Dec 2023
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object
  Detection
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
Hu Zhang
Jianhua Xu
Tao Tang
Haiyang Sun
Xin Yu
Zi Huang
Kaicheng Yu
ObjD
3DPC
35
12
0
12 Dec 2023
PixLore: A Dataset-driven Approach to Rich Image Captioning
PixLore: A Dataset-driven Approach to Rich Image Captioning
Diego Bonilla
VLM
14
0
0
08 Dec 2023
Aligning and Prompting Everything All at Once for Universal Visual
  Perception
Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen
Chaoyou Fu
Peixian Chen
Mengdan Zhang
Ke Li
Xing Sun
Yunsheng Wu
Shaohui Lin
Rongrong Ji
VLM
ObjD
48
33
0
04 Dec 2023
Language-conditioned Detection Transformer
Language-conditioned Detection Transformer
Jang Hyun Cho
Philipp Krahenbuhl
VLM
ObjD
47
1
0
29 Nov 2023
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic
  Vision-Language Planning
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
Yingdong Hu
Fanqi Lin
Tong Zhang
Li Yi
Yang Gao
LM&Ro
91
101
0
29 Nov 2023
Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion
Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion
Hao Zhou
Tiancheng Shen
Xu Yang
Hai Huang
Xiangtai Li
Lu Qi
Ming-Hsuan Yang
89
12
0
06 Nov 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D
  Data
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu
Haonan Chang
E. Jing
Abdeslam Boularias
Kostas Bekris
18
54
0
06 Nov 2023
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
Nicholas Walker
Stefan Ultes
Pierre Lison
LM&Ro
56
1
0
03 Nov 2023
Object-centric Video Representation for Long-term Action Anticipation
Object-centric Video Representation for Long-term Action Anticipation
Ce Zhang
Changcheng Fu
Shijie Wang
Nakul Agarwal
Kwonjoon Lee
Chiho Choi
Chen Sun
23
14
0
31 Oct 2023
Spuriosity Rankings for Free: A Simple Framework for Last Layer
  Retraining Based on Object Detection
Spuriosity Rankings for Free: A Simple Framework for Last Layer Retraining Based on Object Detection
Mohammad Azizmalayeri
Reza Abbasi
Amir Hosein Haji Mohammad Rezaie
Reihaneh Zohrabi
Mahdi Amiri
M. T. Manzuri
M. Rohban
19
0
0
31 Oct 2023
LP-OVOD: Open-Vocabulary Object Detection by Linear Probing
LP-OVOD: Open-Vocabulary Object Detection by Linear Probing
Chau Pham
Truong Vu
Khoi Duc Minh Nguyen
ObjD
22
16
0
26 Oct 2023
One-Shot Imitation Learning: A Pose Estimation Perspective
One-Shot Imitation Learning: A Pose Estimation Perspective
Pietro Vitiello
Kamil Dreczkowski
Edward Johns
29
18
0
18 Oct 2023
GenEval: An Object-Focused Framework for Evaluating Text-to-Image
  Alignment
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment
Dhruba Ghosh
Hanna Hajishirzi
Ludwig Schmidt
9
137
0
17 Oct 2023
Towards Robust Multi-Modal Reasoning via Model Selection
Towards Robust Multi-Modal Reasoning via Model Selection
Xiangyan Liu
Rongxue Li
Wei Ji
Tao Lin
LLMAG
LRM
37
3
0
12 Oct 2023
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
Wen-Hsuan Chu
Adam W. Harley
P. Tokmakov
Achal Dave
Leonidas J. Guibas
Katerina Fragkiadaki
VLM
30
7
0
10 Oct 2023
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Mihir Prabhudesai
Anirudh Goyal
Deepak Pathak
Katerina Fragkiadaki
37
111
0
05 Oct 2023
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for
  Open-vocabulary 3D Object Detection
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
24
33
0
04 Oct 2023
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods
Samyadeep Basu
Mehrdad Saberi
S. Bhardwaj
Atoosa Malemir Chegini
Daniela Massiceti
Maziar Sanjabi
S. Hu
S. Feizi
55
16
0
03 Oct 2023
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense
  Prediction
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Size Wu
Wenwei Zhang
Lumin Xu
Sheng Jin
Xiangtai Li
Wentao Liu
Chen Change Loy
CLIP
VLM
26
69
0
02 Oct 2023
DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object
  Detection
DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
Shilin Xu
Xiangtai Li
Size Wu
Wenwei Zhang
Yunhai Tong
Chen Change Loy
ObjD
VLM
31
0
0
02 Oct 2023
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
17
3
0
29 Sep 2023
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Sanghwan Kim
Hao Tang
Fisher Yu
VLM
CLIP
21
4
0
28 Sep 2023
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
Haonan Chang
Kowndinya Boyalakuntla
Shiyang Lu
Siwei Cai
E. Jing
...
Shijie Geng
Adeeb Abbas
Lifeng Zhou
Kostas Bekris
Abdeslam Boularias
14
26
0
27 Sep 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCL
VLM
27
3
0
26 Sep 2023
Unsupervised 3D Perception with 2D Vision-Language Distillation for
  Autonomous Driving
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving
Mahyar Najibi
Jingwei Ji
Yin Zhou
C. Qi
Xinchen Yan
Scott Ettinger
Drago Anguelov
19
27
0
25 Sep 2023
SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous
  Teleoperation Systems
SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous Teleoperation Systems
Joonhyung Lee
Sangbeom Park
Jeongeun Park
Kyungjae Lee
Sungjoon Choi
36
2
0
25 Sep 2023
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary
  Instance Segmentation
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
Jiahao Xie
Wei Li
Xiangtai Li
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
DiffM
VLM
69
35
0
22 Sep 2023
Bridging Zero-shot Object Navigation and Foundation Models through
  Pixel-Guided Navigation Skill
Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill
Wenzhe Cai
Siyuan Huang
Guangran Cheng
Yuxing Long
Peng Gao
Changyin Sun
Hao Dong
LM&Ro
25
41
0
19 Sep 2023
Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping
Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping
Adam Rashid
Satvik Sharma
C. Kim
J. Kerr
L. Chen
Angjoo Kanazawa
Ken Goldberg
62
85
0
14 Sep 2023
Zero-Shot Visual Classification with Guided Cropping
Zero-Shot Visual Classification with Guided Cropping
Piyapat Saranrittichai
Mauricio Muñoz
Volker Fischer
Chaithanya Kumar Mummadi
VLM
32
1
0
12 Sep 2023
Towards Content-based Pixel Retrieval in Revisited Oxford and Paris
Towards Content-based Pixel Retrieval in Revisited Oxford and Paris
G. An
Woo Jae Kim
Saelyne Yang
Rong Li
Yuchi Huo
Sung-eui Yoon
VLM
34
4
0
11 Sep 2023
Physically Grounded Vision-Language Models for Robotic Manipulation
Physically Grounded Vision-Language Models for Robotic Manipulation
Jensen Gao
Bidipta Sarkar
F. Xia
Ted Xiao
Jiajun Wu
Brian Ichter
Anirudha Majumdar
Dorsa Sadigh
LM&Ro
20
113
0
05 Sep 2023
RoboAgent: Generalization and Efficiency in Robot Manipulation via
  Semantic Augmentations and Action Chunking
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking
Homanga Bharadhwaj
Jay Vakil
Mohit Sharma
Abhi Gupta
Shubham Tulsiani
Vikash Kumar
LM&Ro
21
116
0
05 Sep 2023
EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
Cheng Shi
Sibei Yang
VLM
ObjD
38
38
0
03 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
23
27
0
02 Sep 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
  Detection
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
32
5
0
30 Aug 2023
Video OWL-ViT: Temporally-consistent open-world localization in video
Video OWL-ViT: Temporally-consistent open-world localization in video
G. Heigold
Matthias Minderer
A. Gritsenko
Alex Bewley
Daniel Keysers
Mario Luvcić
F. I. F. Richard Yu
Thomas Kipf
VLM
19
14
0
22 Aug 2023
UnLoc: A Unified Framework for Video Localization Tasks
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
31
53
0
21 Aug 2023
Previous
12345
Next