Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.06230
Cited By
Simple Open-Vocabulary Object Detection with Vision Transformers
12 May 2022
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
Alexey Dosovitskiy
Aravindh Mahendran
Anurag Arnab
Mostafa Dehghani
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Simple Open-Vocabulary Object Detection with Vision Transformers"
50 / 247 papers shown
Title
Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection
Daniel Bogdoll
Rajanikant Ananta
Abeyankar Giridharan
Isabel Moore
Gregory Stevens
Henry X. Liu
VLM
51
0
0
30 Apr 2025
Beyond Task and Motion Planning: Hierarchical Robot Planning with General-Purpose Policies
Benned Hedegaard
Ziyi Yang
Yichen Wei
Ahmed Jaafar
Stefanie Tellex
George Konidaris
Naman Shah
26
0
0
24 Apr 2025
MorphoNavi: Aerial-Ground Robot Navigation with Object Oriented Mapping in Digital Twin
Sausar Karaf
Mikhail Martynov
Oleg Sautenkov
Zhanibek Darush
Dzmitry Tsetserukou
47
1
0
23 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
43
0
0
19 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
Jingyang Zhang
...
Jiahui Lv
Ziqiang Liu
Tengyuan Shi
Qingjie Liu
Yixuan Wang
MLLM
VLM
63
1
0
13 Apr 2025
ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation
Wenqi Guo
Shan Du
VLM
60
0
0
10 Apr 2025
Unlocking Open-Set Language Accessibility in Vision Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
55
0
0
14 Mar 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin
Mike Zheng Shou
VGen
159
1
0
12 Mar 2025
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou
Manli Tao
Chaoyang Zhao
Haiyun Guo
Honghui Dong
Ming Tang
J. T. Wang
46
0
0
11 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
46
0
0
10 Mar 2025
Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks
Lukás Gajdosech
Hassan Ali
Jan-Gerrit Habekost
Martin Madaras
Matthias Kerzel
Stefan Wermter
54
0
0
06 Mar 2025
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
Wenqi Guo
Yiyang Du
Shan Du
75
1
0
04 Mar 2025
Language-Guided Object Search in Agricultural Environments
Advaith Balaji
Saket Pradhan
Dmitry Berenson
LM&Ro
50
0
0
03 Mar 2025
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Shivansh Patel
Xinchen Yin
Wenlong Huang
Shubham Garg
H. Nayyeri
Li Fei-Fei
Svetlana Lazebnik
Yongqian Li
92
0
0
12 Feb 2025
Towards Wearable Interfaces for Robotic Caregiving
Akhil Padmanabha
Carmel Majidi
Zackory M. Erickson
67
1
0
07 Feb 2025
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Connor Schenck
Isaac Reid
M. Jacob
Alex Bewley
Joshua Ainslie
...
Matthias Minderer
Dmitry Kalashnikov
Jonathan Tompson
Vikas Sindhwani
Krzysztof Choromanski
66
1
0
04 Feb 2025
Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection
Xiangyu Gao
Yu Dai
Benliu Qiu
Hongliang Li
Heqian Qiu
Hongliang Li
ObjD
VLM
151
0
0
28 Jan 2025
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
37
0
0
25 Dec 2024
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
S. Nagendra
Kashif Rashid
Chaopeng Shen
Daniel Kifer
VLM
73
2
0
16 Dec 2024
Towards Real-Time Open-Vocabulary Video Instance Segmentation
Bin Yan
Martin Sundermeyer
D. Tan
Huchuan Lu
F. Tombari
VLM
VOS
92
1
0
05 Dec 2024
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Haozhan Shen
Kangjia Zhao
Tiancheng Zhao
Ruochen Xu
Zilun Zhang
Mingwei Zhu
Jianwei Yin
97
4
0
25 Nov 2024
TrojanRobot: Physical-World Backdoor Attacks Against VLM-based Robotic Manipulation
X. U. Wang
Hewen Pan
Hangtao Zhang
Minghui Li
Shengshan Hu
...
Peijin Guo
Yichen Wang
Wei Wan
Aishan Liu
L. Zhang
AAML
85
4
0
18 Nov 2024
Evaluating the Generation of Spatial Relations in Text and Image Generative Models
Shang Hong Sim
Clarence Lee
A. Tan
Cheston Tan
EGVM
38
2
0
12 Nov 2024
A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model
Panwen Hu
Nan Xiao
Feifei Li
Yongquan Chen
Rui Huang
VGen
OffRL
57
3
0
07 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
50
2
0
05 Nov 2024
Vocal Sandbox: Continual Learning and Adaptation for Situated Human-Robot Collaboration
J. Grannen
Siddharth Karamcheti
Suvir Mirchandani
Percy Liang
Dorsa Sadigh
39
0
0
04 Nov 2024
ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images
Timing Yang
Yuanliang Ju
Li Yi
3DPC
34
3
0
31 Oct 2024
Domain Adaptation with a Single Vision-Language Embedding
Mohammad Fahes
Tuan-Hung Vu
Andrei Bursuc
Patrick Pérez
Raoul de Charette
VLM
28
0
0
28 Oct 2024
Synthetica: Large Scale Synthetic Data for Robot Perception
Ritvik Singh
Jingzhou Liu
Karl Van Wyk
Yu-Wei Chao
Jean-Francois Lafleche
Florian Shkurti
Nathan D. Ratliff
Ankur Handa
25
1
0
28 Oct 2024
EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering
Kai Cheng
Zhengyuan Li
Xingpeng Sun
Byung-Cheol Min
Amrit Singh Bedi
Aniket Bera
43
2
0
26 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
43
0
0
26 Oct 2024
Zero-shot Object Navigation with Vision-Language Models Reasoning
Congcong Wen
Yisiyuan Huang
Hao Huang
Yanjia Huang
Shuaihang Yuan
Yu Hao
Hui Lin
Yu-Shen Liu
Yi Fang
LM&Ro
50
7
0
24 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
35
3
0
21 Oct 2024
Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability
Yusuke Hosoya
Masanori Suganuma
Takayuki Okatani
ObjD
16
0
0
20 Oct 2024
In-Context Learning Enables Robot Action Prediction in LLMs
Yida Yin
Zekai Wang
Yuvan Sharma
Dantong Niu
Trevor Darrell
Roei Herzig
LM&Ro
114
1
0
16 Oct 2024
Large Model for Small Data: Foundation Model for Cross-Modal RF Human Activity Recognition
Yuxuan Weng
Guoquan Wu
Tianyue Zheng
Yanbing Yang
Jun Luo
21
5
0
13 Oct 2024
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation
Hang Yin
Xiuwei Xu
Zhenyu Wu
Jie Zhou
Jiwen Lu
38
13
0
10 Oct 2024
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution
Corban Rivera
Grayson Byrd
William Paul
Tyler Feldman
Meghan Booker
...
Krishna Murthy Jatavallabhula
Celso M. De Melo
Lalithkumar Seenivasan
Mathias Unberath
Rama Chellappa
LLMAG
LM&Ro
28
0
0
08 Oct 2024
PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM
Stefan Stefanache
Lluís Pastor Pérez
Julen Costa Watanabe
Ernesto Sanchez Tejedor
Thomas Hofmann
Enis Simsar
EGVM
28
0
0
08 Oct 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin
Chaehyun Kim
Sunghwan Hong
Seokju Cho
Anurag Arnab
Paul Hongsuck Seo
Seungryong Kim
VLM
34
1
0
30 Sep 2024
VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection
Liangyu Zhong
Joachim Sicking
Fabian Hüger
Hanno Gottschalk
VLM
35
0
0
25 Sep 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
46
3
0
16 Sep 2024
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
Wenlong Huang
Chen Wang
Yongqian Li
Ruohan Zhang
Li Fei-Fei
46
88
0
03 Sep 2024
Haptics-based, higher-order Sensory Substitution designed for Object Negotiation in Blindness and Low Vision: Virtual Whiskers
Junchi Feng
Giles Hamilton-Fletcher
Todd E. Hudson
Mahya Beheshti
Maurizio Porfiri
John-Ross Rizzo
36
0
0
26 Aug 2024
Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning
Alessio Devoto
Federico Alvetreti
Jary Pomponi
P. Lorenzo
Pasquale Minervini
Simone Scardapane
51
2
0
16 Aug 2024
Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
Tri Ton
Ji Woo Hong
Soohwan Eom
Jun Yeop Shim
Junyeong Kim
Chang D. Yoo
3DPC
ISeg
47
2
0
16 Aug 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yunhong Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
36
3
0
05 Aug 2024
Visual Grounding for Object-Level Generalization in Reinforcement Learning
Haobin Jiang
Zongqing Lu
LM&Ro
34
2
0
04 Aug 2024
Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction
Gertjan J. Burghouts
M. Schaaphok
M. V. Bekkum
W. Meijer
Fieke Hillerstrom
Jelle van Mil
LM&Ro
28
0
0
18 Jul 2024
1
2
3
4
5
Next