Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.17270
Cited By
YOLO-World: Real-Time Open-Vocabulary Object Detection
30 January 2024
Tianheng Cheng
Lin Song
Yixiao Ge
Wenyu Liu
Xinggang Wang
Ying Shan
VLM
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"YOLO-World: Real-Time Open-Vocabulary Object Detection"
50 / 160 papers shown
Title
Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models
Y. Ranasinghe
Vibashan Vs
James Uplinger
C. D. Melo
Vishal M. Patel
36
0
0
13 Jan 2025
Toward Realistic Camouflaged Object Detection: Benchmarks and Method
Zhimeng Xin
Tianxu Wu
Shiming Chen
Shuo Ye
Zijing Xie
Yixiong Zou
Xinge You
Yufei Guo
31
0
0
13 Jan 2025
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Wen-Dong Jiang
Chih-Yung Chang
Diptendu Sinha Roy
40
0
0
07 Jan 2025
RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba
Andong Lu
Wanyu Wang
Chenglong Li
Jin Tang
B. Luo
Mamba
49
2
0
31 Dec 2024
YOLO-UniOW: Efficient Universal Open-World Object Detection
Lihao Liu
Juexiao Feng
Hui Chen
Ao Wang
Lin Song
J. Han
Guiguang Ding
ObjD
VLM
46
2
0
31 Dec 2024
AI-Powered Urban Transportation Digital Twin: Methods and Applications
Xuan Di
Yongjie Fu
Mehmet K.Turkcan
Mahshid Ghasemi
Zhaobin Mo
Chengbo Zang
Abhishek Adhikari
Z. Kostić
Gil Zussman
AI4CE
33
0
0
30 Dec 2024
Leveraging Content and Context Cues for Low-Light Image Enhancement
Igor Morawski
Kai He
Shusil Dangi
Winston H. Hsu
93
0
0
10 Dec 2024
Towards Real-Time Open-Vocabulary Video Instance Segmentation
Bin Yan
Martin Sundermeyer
D. Tan
Huchuan Lu
F. Tombari
VLM
VOS
92
1
0
05 Dec 2024
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
Weixin Mao
Weiheng Zhong
Zhou Jiang
Dong Fang
Zhongyue Zhang
...
Fan Jia
Tiancai Wang
Haoqiang Fan
Osamu Yoshie
Osamu Yoshie
119
4
0
29 Nov 2024
Don't Let Your Robot be Harmful: Responsible Robotic Manipulation
Minheng Ni
Lei Zhang
Zhe Chen
L. Zhang
Wangmeng Zuo
72
1
0
27 Nov 2024
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li
Zhengkang Xiang
Joseph West
Kourosh Khoshelham
ObjD
VLM
94
1
0
27 Nov 2024
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia
Jishuo Li
Zhiwei Lin
Xinhao Wang
Yixuan Wang
Ming-Hsuan Yang
VLM
74
2
0
26 Nov 2024
Open Vocabulary Monocular 3D Object Detection
Jin Yao
Hao Gu
Xuweiyi Chen
Jiayun Wang
Zezhou Cheng
ObjD
VLM
73
3
0
25 Nov 2024
Language Driven Occupancy Prediction
Zhu Yu
Bowen Pang
Lizhe Liu
Runmin Zhang
Qihao Peng
Maochun Luo
Sheng Yang
Mingxia Chen
Si-Yuan Cao
Hui-Liang Shen
87
2
0
25 Nov 2024
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen
Siyuan Liang
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Hua Zhang
Xiaochun Cao
FAtt
82
4
0
25 Nov 2024
Fine-Grained Open-Vocabulary Object Recognition via User-Guided Segmentation
Jinwoo Ahn
Hyeokjoon Kwon
Hwiyeon Yoo
ObjD
VLM
77
0
0
23 Nov 2024
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences
Zihan Wang
Brian Liang
Varad Dhat
Zander Brumbaugh
Nick Walker
Ranjay Krishna
Maya Cakmak
61
4
0
20 Nov 2024
An Application-Agnostic Automatic Target Recognition System Using Vision Language Models
Anthony Palladino
Dana Gajewski
Abigail Aronica
Patryk Deptula
Alexander Hamme
...
Jeff Muri
Todd Nelling
Michael A. Riley
Brian Wong
Margaret Duff
32
1
0
05 Nov 2024
Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation
Yan Li
Weiwei Guo
Xue Yang
Ning Liao
Shaofeng Zhang
Yi Yu
Wenxian Yu
Junchi Yan
ObjD
38
0
0
04 Nov 2024
ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation
Hengkai Tan
Xuezhou Xu
Chengyang Ying
Xinyi Mao
Songming Liu
Xingxing Zhang
Hang Su
Jun Zhu
46
4
0
04 Nov 2024
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
Cheng-Chun Hsu
Bowen Wen
Jie Xu
Yashraj S. Narang
Xiaolong Wang
Yuke Zhu
Joydeep Biswas
Stan Birchfield
DiffM
41
8
0
01 Nov 2024
From Explicit Rules to Implicit Reasoning in an Interpretable Violence Monitoring System
Wen-Dong Jiang
Chih-Yung Chang
Ssu-Chi Kuai
Diptendu Sinha Roy
40
0
0
29 Oct 2024
YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions
Xiguang Li
Jiafu Chen
Yunhe Sun
Na Lin
Ammar Hawbani
Liang Zhao
VLM
28
0
0
23 Oct 2024
Few-shot target-driven instance detection based on open-vocabulary object detection models
Ben Crulis
Barthélémy Serres
Cyril de Runz
Gilles Venturini
VLM
ObjD
24
0
0
21 Oct 2024
MSGField: A Unified Scene Representation Integrating Motion, Semantics, and Geometry for Robotic Manipulation
Yu Sheng
Runfeng Lin
L. Wang
Quecheng Qiu
Yanyong Zhang
Yu Zhang
Bei Hua
Jianmin Ji
3DV
3DGS
31
0
0
21 Oct 2024
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
Hao-Tang Tsui
Chien-Yao Wang
H. Liao
ObjD
VLM
51
0
0
20 Oct 2024
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Runsen Xu
Zhiwei Huang
Tai Wang
Y. Chen
Jiangmiao Pang
Dahua Lin
VGen
41
11
0
17 Oct 2024
Reference-Based Post-OCR Processing with LLM for Precise Diacritic Text in Historical Document Recognition
T. Do
Dinh Phu Tran
An Vo
Daeyoung Kim
24
0
0
17 Oct 2024
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao
Jiaming Han
Changsheng Li
Yu-Feng Li
Xiangyu Yue
RALM
53
1
0
17 Oct 2024
Words to Wheels: Vision-Based Autonomous Driving Understanding Human Language Instructions Using Foundation Models
Chanhoe Ryu
Hyunki Seong
Daegyu Lee
Seongwoo Moon
Sungjae Min
David Hyunchul Shim
19
0
0
14 Oct 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
37
3
0
13 Oct 2024
Ego3DT: Tracking Every 3D Object in Ego-centric Videos
Shengyu Hao
Wenhao Chai
Zhonghan Zhao
Meiqi Sun
Wendi Hu
...
Yixian Zhao
Qi Li
Yizhou Wang
Xi Li
Gaoang Wang
37
1
0
11 Oct 2024
Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
Zhiwei Lin
Yongtao Wang
Zhi Tang
ObjD
VLM
30
2
0
08 Oct 2024
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection
Zishuo Wang
Wenhao Zhou
Jinglin Xu
Yuxin Peng
ObjD
VLM
21
1
0
08 Oct 2024
MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-object Demand-driven Navigation
Hongcheng Wang
Peiqi Liu
Wenzhe Cai
Mingdong Wu
Zhengyu Qian
Hao Dong
21
0
0
04 Oct 2024
Adaptive Masking Enhances Visual Grounding
Sen Jia
Lei Li
26
0
0
04 Oct 2024
Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking
Ayesha Ishaq
Mohamed El Amine Boudjoghra
Jean Lahoud
F. Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
94
1
0
02 Oct 2024
Resolving Positional Ambiguity in Dialogues by Vision-Language Models for Robot Navigation
Kuan-Lin Chen
Tzu-Ti Wei
Li-Tzu Yeh
Elaine Kao
Yu-Chee Tseng
Jen-Jee Chen
LM&Ro
24
0
0
30 Sep 2024
You Only Speak Once to See
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
35
1
0
27 Sep 2024
Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience
Leonard Barmann
Chad DeChant
Joana Plewnia
Fabian Peller-Konrad
Daniel Bauer
Tamim Asfour
Alex Waibel
LM&Ro
32
1
0
26 Sep 2024
OW-Rep: Open World Object Detection with Instance Representation Learning
Sunoh Lee
Minsik Jeon
Jihong Min
Junwon Seo
ObjD
149
0
0
24 Sep 2024
Automatic Behavior Tree Expansion with LLMs for Robotic Manipulation
Jonathan Styrud
Matteo Iovino
M. Norrlöf
Mårten Björkman
Christian Smith
LLMAG
29
4
0
20 Sep 2024
Enhancing Agricultural Environment Perception via Active Vision and Zero-Shot Learning
Michele Carlo La Greca
Mirko Usuelli
Matteo Matteucci
21
0
0
19 Sep 2024
Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks
Joji Joseph
B. Amrutur
Shalabh Bhatnagar
3DGS
35
1
0
18 Sep 2024
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
28
0
0
18 Sep 2024
One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation
F. L. Busch
Timon Homberger
Jesús Ortega-Peimbert
Quantao Yang
Olov Andersson
34
1
0
18 Sep 2024
Synthetic data augmentation for robotic mobility aids to support blind and low vision people
Hochul Hwang
Krisha Adhikari
Satya Shodhaka
Donghyun Kim
26
0
0
17 Sep 2024
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
Zhixi Cai
Cristian Rojas Cardenas
Kevin Leo
Chenyuan Zhang
Kal Backman
...
Yuan-Fang Li
Mor Vered
Peter J. Stuckey
M. G. D. L. Banda
Hamid Rezatofighi
36
7
0
16 Sep 2024
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Qilong Zhangli
Di Liu
Abhishek Aich
Dimitris Metaxas
S. Schulter
36
0
0
15 Sep 2024
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection
Haoxuan Wang
Q. He
Jinlong Peng
Hao Yang
Mingmin Chi
Yabiao Wang
Mamba
39
1
0
13 Sep 2024
Previous
1
2
3
4
Next