ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.06230
  4. Cited By
Simple Open-Vocabulary Object Detection with Vision Transformers

Simple Open-Vocabulary Object Detection with Vision Transformers

12 May 2022
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
Alexey Dosovitskiy
Aravindh Mahendran
Anurag Arnab
Mostafa Dehghani
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
    ObjD
    CLIP
    VLM
    ViT
    OCL
ArXivPDFHTML

Papers citing "Simple Open-Vocabulary Object Detection with Vision Transformers"

50 / 247 papers shown
Title
Affordance-Guided Reinforcement Learning via Visual Prompting
Affordance-Guided Reinforcement Learning via Visual Prompting
Olivia Y. Lee
Annie Xie
Kuan Fang
Karl Pertsch
Chelsea Finn
OffRL
LM&Ro
74
7
0
14 Jul 2024
Real-Time Anomaly Detection and Reactive Planning with Large Language
  Models
Real-Time Anomaly Detection and Reactive Planning with Large Language Models
Rohan Sinha
Amine Elhafsi
Christopher Agia
Matthew Foutter
Edward Schmerling
Marco Pavone
OffRL
LRM
45
26
0
11 Jul 2024
Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs
  and Topological Graphs
Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs
Hao-Tien Lewis Chiang
Zhuo Xu
Zipeng Fu
M. Jacob
Tingnan Zhang
...
Carolina Parada
Chelsea Finn
Peng Xu
Sergey Levine
Jie Tan
LM&Ro
51
20
0
10 Jul 2024
Transfer Learning with Self-Supervised Vision Transformers for Snake
  Identification
Transfer Learning with Self-Supervised Vision Transformers for Snake Identification
Anthony Miyaguchi
Murilo Gustineli
Austin Fischer
Ryan Lundqvist
27
3
0
08 Jul 2024
Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Robotic Manipulation
Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Robotic Manipulation
Hang Li
Qian Feng
Zhi Zheng
Jianxiang Feng
Zhaopeng Chen
Alois Knoll
26
1
0
29 Jun 2024
3D Feature Distillation with Object-Centric Priors
3D Feature Distillation with Object-Centric Priors
Georgios Tziafas
Yucheng Xu
Zhibin Li
H. Kasaei
34
1
0
26 Jun 2024
Towards Open-World Grasping with Large Vision-Language Models
Towards Open-World Grasping with Large Vision-Language Models
Georgios Tziafas
H. Kasaei
LM&Ro
LRM
37
12
0
26 Jun 2024
Accurate and Fast Pixel Retrieval with Spatial and Uncertainty Aware
  Hypergraph Diffusion
Accurate and Fast Pixel Retrieval with Spatial and Uncertainty Aware Hypergraph Diffusion
G. An
Yuchi Huo
Sung-eui Yoon
52
0
0
17 Jun 2024
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for
  Vision-Language Models
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
Xiyang Wu
Tianrui Guan
Dianqi Li
Shuaiyi Huang
Xiaoyu Liu
...
Abhinav Shrivastava
Furong Huang
Jordan L. Boyd-Graber
Dinesh Manocha
Dinesh Manocha
HILM
LRM
VLM
MLLM
30
14
0
16 Jun 2024
Details Make a Difference: Object State-Sensitive Neurorobotic Task
  Planning
Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning
Xiaowen Sun
Xufeng Zhao
Jae Hee Lee
Wenhao Lu
Matthias Kerzel
Stefan Wermter
LM&Ro
39
2
0
14 Jun 2024
Understanding Visual Concepts Across Models
Understanding Visual Concepts Across Models
Brandon Trabucco
Max Gurinas
Kyle Doherty
Ruslan Salakhutdinov
VLM
45
0
0
11 Jun 2024
OVMR: Open-Vocabulary Recognition with Multi-Modal References
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma
Shiliang Zhang
Longhui Wei
Qi Tian
VLM
41
0
0
07 Jun 2024
Collaborative Novel Object Discovery and Box-Guided Cross-Modal
  Alignment for Open-Vocabulary 3D Object Detection
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
47
6
0
02 Jun 2024
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
Fangyi Chen
Han Zhang
Zhantao Yang
Hao Chen
Kai Hu
Marios Savvides
ObjD
VLM
41
5
0
30 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
58
4
0
28 May 2024
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene
  Encoders For Manipulation Policies
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies
Jianing Qian
Anastasios Panagopoulos
Dinesh Jayaraman
LM&Ro
ViT
38
5
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
42
0
23 May 2024
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Tianhe Ren
Qing Jiang
Shilong Liu
Zhaoyang Zeng
Wenlong Liu
...
Hao Zhang
Feng Li
Peijun Tang
Kent Yu
Lei Zhang
ObjD
VLM
42
34
0
16 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks
  via Multi-modal Large Language Models
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
33
13
0
16 May 2024
Open-Vocabulary Object Detection via Neighboring Region Attention
  Alignment
Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
Sunyuan Qiang
Xianfei Li
Yanyan Liang
Wenlong Liao
Tao He
Pai Peng
ObjD
40
0
0
14 May 2024
G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
Zeyu Wang
Yuanchun Shi
Yuntao wang
Yuchen Yao
Kun Yan
Yuhan Wang
Lei Ji
Xuhai Xu
Chun Yu
40
7
0
13 May 2024
Watch Your Step: Optimal Retrieval for Continual Learning at Scale
Watch Your Step: Optimal Retrieval for Continual Learning at Scale
Truman Hickok
Dhireesha Kudithipudi
37
1
0
16 Apr 2024
On the Robustness of Language Guidance for Low-Level Vision Tasks:
  Findings from Depth Estimation
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee
Tejas Gokhale
Chitta Baral
Yezhou Yang
VLM
32
2
0
12 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
44
20
0
09 Apr 2024
Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity
Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity
Jacob Varley
Sumeet Singh
Deepali Jain
Krzysztof Choromanski
Andy Zeng
Somnath Basu Roy Chowdhury
Kumar Avinava Dubey
Vikas Sindhwani
LM&Ro
34
14
0
04 Apr 2024
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image
  Generation
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu
Yongxin Yang
Shifeng Zhang
Fei Chen
Steven G. McDonagh
Gerasimos Lampouras
Ignacio Iacobacci
Sarah Parisot
42
10
0
03 Apr 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via
  Negations
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh
Ishaan Shrivastava
Mayank Vatsa
Richa Singh
Aparna Bharati
VLM
CoGe
34
14
0
29 Mar 2024
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via
  Cycle-Modality Propagation
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang
Yali Li
Taichi Liu
Hengshuang Zhao
Shengjin Wang
3DPC
ObjD
40
7
0
28 Mar 2024
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with
  Open-Vocabulary Detection and StructurEd Representation
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
Ganlong Zhao
Guanbin Li
Weikai Chen
Yizhou Yu
37
4
0
26 Mar 2024
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban
  Environments
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
Djamahl Etchegaray
Zi Huang
Tatsuya Harada
Yadan Luo
31
9
0
20 Mar 2024
Generative Region-Language Pretraining for Open-Ended Object Detection
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin
Yi-Xin Jiang
Lizhen Qu
Zehuan Yuan
Jianfei Cai
ObjD
VLM
53
13
0
15 Mar 2024
Open-Vocabulary Object Detection with Meta Prompt Representation and
  Instance Contrastive Optimization
Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
Zhao Wang
Aoxue Li
Fengwei Zhou
Zhenguo Li
Qi Dou
ObjD
VLM
32
2
0
14 Mar 2024
Leveraging Foundation Model Automatic Data Augmentation Strategies and
  Skeletal Points for Hands Action Recognition in Industrial Assembly Lines
Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines
Liang Wu
X.-G. Ma
40
1
0
14 Mar 2024
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
Zicheng Zhang
Tong Zhang
Yi Zhu
Jian-zhuo Liu
Xiaodan Liang
QiXiang Ye
Wei Ke
VLM
49
2
0
13 Mar 2024
CoPa: General Robotic Manipulation through Spatial Constraints of Parts
  with Foundation Models
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
Haoxu Huang
Fanqi Lin
Yingdong Hu
Shengjie Wang
Yang Gao
38
49
0
13 Mar 2024
TutoAI: A Cross-domain Framework for AI-assisted Mixed-media Tutorial
  Creation on Physical Tasks
TutoAI: A Cross-domain Framework for AI-assisted Mixed-media Tutorial Creation on Physical Tasks
Yuexi Chen
Vlad I. Morariu
Anh Truong
Zhicheng Liu
DiffM
VGen
45
4
0
12 Mar 2024
DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp
  Policies
DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies
William Xie
Jensen Lavering
N. Correll
20
7
0
12 Mar 2024
PEEB: Part-based Image Classifiers with an Explainable and Editable
  Language Bottleneck
PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck
Thang M. Pham
Peijie Chen
Tin Nguyen
Seunghyun Yoon
Trung Bui
Anh Nguyen
VLM
49
7
0
08 Mar 2024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Ibrahim M. Alabdulmohsin
Xiao Wang
Andreas Steiner
Priya Goyal
Alexander DÁmour
Xiao-Qi Zhai
34
16
0
07 Mar 2024
When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on
  its Contour-following Ability
When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability
Wenjie Xuan
Yufei Xu
Shanshan Zhao
Chaoyue Wang
Juhua Liu
Bo Du
Dacheng Tao
26
2
0
01 Mar 2024
Verifiably Following Complex Robot Instructions with Foundation Models
Verifiably Following Complex Robot Instructions with Foundation Models
Benedict Quartey
Eric Rosen
Stefanie Tellex
George Konidaris
LM&Ro
47
11
0
18 Feb 2024
Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback
Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback
V. Bhat
Ali Umut Kaypak
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
LM&Ro
66
13
0
13 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
48
45
0
08 Feb 2024
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained
  Descriptors
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
Sheng Jin
Xue-Qiu Jiang
Jiaxing Huang
Lewei Lu
Shijian Lu
VLM
ObjD
31
21
0
07 Feb 2024
ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
Jirayu Burapacheep
Ishan Gaur
Agam Bhatia
Tristan Thrush
40
4
0
07 Feb 2024
V-IRL: Grounding Virtual Intelligence in Real Life
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan Yang
Runyu Ding
Ellis L Brown
Xiaojuan Qi
Saining Xie
LM&Ro
56
19
0
05 Feb 2024
YOLO-World: Real-Time Open-Vocabulary Object Detection
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng
Lin Song
Yixiao Ge
Wenyu Liu
Xinggang Wang
Ying Shan
VLM
ObjD
38
249
0
30 Jan 2024
Growing from Exploration: A self-exploring framework for robots based on
  foundation models
Growing from Exploration: A self-exploring framework for robots based on foundation models
Shoujie Li
Ran Yu
Tong Wu
JunWen Zhong
Xiao-Ping Zhang
Wenbo Ding
24
1
0
24 Jan 2024
On the Efficacy of Text-Based Input Modalities for Action Anticipation
On the Efficacy of Text-Based Input Modalities for Action Anticipation
Apoorva Beedu
Karan Samel
Irfan Essa
53
2
0
23 Jan 2024
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for
  Robotics
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Peiqi Liu
Yaswanth Orru
Jay Vakil
Chris Paxton
Nur Muhammad (Mahi) Shafiullah
Lerrel Pinto
LM&Ro
VLM
103
39
0
22 Jan 2024
Previous
12345
Next