Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.17270
Cited By
YOLO-World: Real-Time Open-Vocabulary Object Detection
30 January 2024
Tianheng Cheng
Lin Song
Yixiao Ge
Wenyu Liu
Xinggang Wang
Ying Shan
VLM
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"YOLO-World: Real-Time Open-Vocabulary Object Detection"
50 / 160 papers shown
Title
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Hu Yue
Siyuan Huang
Yue Liao
Shengcong Chen
Pengfei Zhou
Liliang Chen
Maoqing Yao
Guanghui Ren
VGen
29
0
0
14 May 2025
Real-Time Privacy Preservation for Robot Visual Perception
Minkyu Choi
Yunhao Yang
N. Bhatt
Kushagra Gupta
Sahil Shah
Aditya Rai
David Fridovich-Keil
Ufuk Topcu
Sandeep P. Chinchali
32
0
0
08 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
53
0
0
08 May 2025
Visual Affordances: Enabling Robots to Understand Object Functionality
Tommaso Apicella
Alessio Xompero
Andrea Cavallaro
43
0
0
08 May 2025
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
53
0
0
06 May 2025
Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
Zengli Luo
Canlong Zhang
Xiaochun Lu
Zhixin Li
Zhiwen Wang
29
0
0
06 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
52
0
0
06 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
69
1
0
01 May 2025
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
Y. Li
Lu Si
Y. T. Hou
Chengaung Liu
B. Li
Hongjian Fang
J. Zhang
79
0
0
30 Apr 2025
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models
Patrick Müller
Alexander Braun
M. Keuper
59
0
0
25 Apr 2025
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Minjae Kang
Martim Brandão
61
0
0
25 Apr 2025
A Decade of You Only Look Once (YOLO) for Object Detection
Leo Thomas Ramos
Angel D. Sappa
66
0
0
24 Apr 2025
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation
Junyuan Fang
Zihan Wang
Y. Zhang
Shuzhe Wang
Iaroslav Melekhov
Juho Kannala
VLM
46
0
0
20 Apr 2025
Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization
Haiyong Yu
Yanqiong Jin
Yonghao He
Wei Sui
27
0
0
14 Apr 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
J. Zhang
...
Jiahui Lv
Z. Liu
Tengyuan Shi
Qingjie Liu
Y. Wang
MLLM
VLM
63
1
0
13 Apr 2025
DSM: Building A Diverse Semantic Map for 3D Visual Grounding
Qinghongbing Xie
Zijian Liang
Long Zeng
29
0
0
11 Apr 2025
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
42
0
0
10 Apr 2025
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Rajhans Singh
Rafael Bidese Puhl
Kshitiz Dhakal
Sudhir Sornapudi
28
0
0
09 Apr 2025
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang
Duo Peng
Feng Chen
Y. Yang
Yinjie Lei
DiffM
76
0
0
02 Apr 2025
Detecting Glioma, Meningioma, and Pituitary Tumors, and Normal Brain Tissues based on Yolov11 and Yolov8 Deep Learning Models
Ahmed M. Taha
Salah A. Aly
Mohamed F. Darwish
31
0
0
31 Mar 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng
Ziqi Huang
Hongbo Liu
Kai Zou
Yinan He
...
Y. Zhang
Jingwen He
Wei-Shi Zheng
Yu Qiao
Ziwei Liu
EGVM
VGen
48
5
0
27 Mar 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
42
0
0
25 Mar 2025
xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion
Saad Lahlali
Sandra Kara
Hejer Ammar
Florian Chabot
Nicolas Granger
Hervé Le Borgne
Q. C. Pham
3DPC
57
0
0
19 Mar 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
C. L. P. Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLM
LRM
76
0
0
17 Mar 2025
MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
Zejia Zhang
Bo-Rong Yang
Xinxing Chen
Weizhuang Shi
Haoyuan Wang
Wei Luo
Jian Huang
48
0
0
17 Mar 2025
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo
Ziyang Chen
Shaoguang Wang
JianXiang He
Yijie Xu
Jinhui Ye
Ying Sun
Hui Xiong
49
1
0
17 Mar 2025
SPOC: Spatially-Progressing Object State Change Segmentation in Video
Priyanka Mandikal
Tushar Nagarajan
Alex Stoken
Zihui Xue
Kristen Grauman
44
0
0
15 Mar 2025
Online Language Splatting
Saimouli Katragadda
Cho-Ying Wu
Yuliang Guo
Xinyu Huang
G. Huang
Liu Ren
3DGS
OffRL
60
0
0
12 Mar 2025
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li
Yifan Jiao
Dan Meng
Heng Fan
L. Zhang
60
0
0
11 Mar 2025
SAS: Segment Any 3D Scene with Integrated 2D Priors
Z. Li
Jiahao Lu
Jiacheng Deng
Hanzhi Chang
Lifan Wu
Yanzhe Liang
Tianzhu Zhang
57
0
0
11 Mar 2025
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
151
0
0
11 Mar 2025
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
Dennis Rotondi
Fabio Scaparro
Hermann Blum
Kai O. Arras
41
0
0
10 Mar 2025
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
VLM
ObjD
74
1
0
10 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
54
0
0
10 Mar 2025
Handle Object Navigation as Weighted Traveling Repairman Problem
Ruimeng Liu
Xinhang Xu
Shenghai Yuan
Lihua Xie
68
2
0
10 Mar 2025
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow
Evelien Riddell
Yimu Wang
Sean Sedwards
Krzysztof Czarnecki
3DPC
46
0
0
09 Mar 2025
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang
Yongchao Feng
Shuai Yang
Z. Liu
Qingjie Liu
Y. Wang
ObjD
160
0
0
08 Mar 2025
Robust Computer-Vision based Construction Site Detection for Assistive-Technology Applications
Junchi Feng
Giles Hamilton-Fletcher
Nikhil Ballem
Michael Batavia
Yifei Wang
Jiuling Zhong
Maurizio Porfiri
John-Ross Rizzo
53
0
0
06 Mar 2025
Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks
Lukás Gajdosech
Hassan Ali
Jan-Gerrit Habekost
Martin Madaras
Matthias Kerzel
Stefan Wermter
54
0
0
06 Mar 2025
OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding
Dianyi Yang
Yu Gao
Xihan Wang
Yufeng Yue
Yi Yang
M. Fu
3DGS
64
1
0
03 Mar 2025
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
ObjD
VLM
49
0
0
28 Feb 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
J. Liu
Peng Wang
Guoqing Wang
Y. Yang
H. Shen
ObjD
94
0
0
27 Feb 2025
Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models
Yaqi Sun
Kyohei Atarashi
Koh Takeuchi
Hisashi Kashima
MLLM
49
0
0
24 Feb 2025
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
Yuheng Xue
Nenglun Chen
Jun Liu
Wenyun Sun
3DPC
66
7
0
24 Feb 2025
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
Caixiong Li
Xiongwei Zhao
Jinhang Zhang
Xing Zhang
Qihao Sun
Zhou Wu
ObjD
MLLM
VLM
56
0
0
23 Feb 2025
OpenVox: Real-time Instance-level Open-vocabulary Probabilistic Voxel Representation
Yinan Deng
Bicheng Yao
Yihang Tang
Yi Yang
Yufeng Yue
36
0
0
23 Feb 2025
Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment
Zeyu Shangguan
Daniel Seita
Mohammad Rostami
ObjD
61
0
0
23 Feb 2025
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
Luzhou Ge
Xiangyu Zhu
Zhuo Yang
Xuesong Li
3DGS
70
0
0
21 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
131
4
0
18 Feb 2025
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Abdalwhab Abdalwhab
A. Imran
Sina Heydarian
I. Iordanova
David St-Onge
49
0
0
16 Jan 2025
1
2
3
4
Next