ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.17270
  4. Cited By
YOLO-World: Real-Time Open-Vocabulary Object Detection

YOLO-World: Real-Time Open-Vocabulary Object Detection

30 January 2024
Tianheng Cheng
Lin Song
Yixiao Ge
Wenyu Liu
Xinggang Wang
Ying Shan
    VLM
    ObjD
ArXivPDFHTML

Papers citing "YOLO-World: Real-Time Open-Vocabulary Object Detection"

50 / 160 papers shown
Title
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Hu Yue
Siyuan Huang
Yue Liao
Shengcong Chen
Pengfei Zhou
Liliang Chen
Maoqing Yao
Guanghui Ren
VGen
29
0
0
14 May 2025
Real-Time Privacy Preservation for Robot Visual Perception
Real-Time Privacy Preservation for Robot Visual Perception
Minkyu Choi
Yunhao Yang
N. Bhatt
Kushagra Gupta
Sahil Shah
Aditya Rai
David Fridovich-Keil
Ufuk Topcu
Sandeep P. Chinchali
32
0
0
08 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
53
0
0
08 May 2025
Visual Affordances: Enabling Robots to Understand Object Functionality
Visual Affordances: Enabling Robots to Understand Object Functionality
Tommaso Apicella
Alessio Xompero
Andrea Cavallaro
43
0
0
08 May 2025
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
53
0
0
06 May 2025
Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
Zengli Luo
Canlong Zhang
Xiaochun Lu
Zhixin Li
Zhiwen Wang
29
0
0
06 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
52
0
0
06 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
69
1
0
01 May 2025
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
Y. Li
Lu Si
Y. T. Hou
Chengaung Liu
B. Li
Hongjian Fang
J. Zhang
79
0
0
30 Apr 2025
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models
Patrick Müller
Alexander Braun
M. Keuper
59
0
0
25 Apr 2025
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Minjae Kang
Martim Brandão
61
0
0
25 Apr 2025
A Decade of You Only Look Once (YOLO) for Object Detection
A Decade of You Only Look Once (YOLO) for Object Detection
Leo Thomas Ramos
Angel D. Sappa
66
0
0
24 Apr 2025
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation
Junyuan Fang
Zihan Wang
Y. Zhang
Shuzhe Wang
Iaroslav Melekhov
Juho Kannala
VLM
46
0
0
20 Apr 2025
Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization
Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization
Haiyong Yu
Yanqiong Jin
Yonghao He
Wei Sui
27
0
0
14 Apr 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
J. Zhang
...
Jiahui Lv
Z. Liu
Tengyuan Shi
Qingjie Liu
Y. Wang
MLLM
VLM
63
1
0
13 Apr 2025
DSM: Building A Diverse Semantic Map for 3D Visual Grounding
DSM: Building A Diverse Semantic Map for 3D Visual Grounding
Qinghongbing Xie
Zijian Liang
Long Zeng
29
0
0
11 Apr 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
42
0
0
10 Apr 2025
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Rajhans Singh
Rafael Bidese Puhl
Kshitiz Dhakal
Sudhir Sornapudi
28
0
0
09 Apr 2025
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang
Duo Peng
Feng Chen
Y. Yang
Yinjie Lei
DiffM
76
0
0
02 Apr 2025
Detecting Glioma, Meningioma, and Pituitary Tumors, and Normal Brain Tissues based on Yolov11 and Yolov8 Deep Learning Models
Detecting Glioma, Meningioma, and Pituitary Tumors, and Normal Brain Tissues based on Yolov11 and Yolov8 Deep Learning Models
Ahmed M. Taha
Salah A. Aly
Mohamed F. Darwish
31
0
0
31 Mar 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng
Ziqi Huang
Hongbo Liu
Kai Zou
Yinan He
...
Y. Zhang
Jingwen He
Wei-Shi Zheng
Yu Qiao
Ziwei Liu
EGVM
VGen
48
5
0
27 Mar 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
42
0
0
25 Mar 2025
xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion
xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion
Saad Lahlali
Sandra Kara
Hejer Ammar
Florian Chabot
Nicolas Granger
Hervé Le Borgne
Q. C. Pham
3DPC
57
0
0
19 Mar 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
C. L. P. Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLM
LRM
76
0
0
17 Mar 2025
MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
Zejia Zhang
Bo-Rong Yang
Xinxing Chen
Weizhuang Shi
Haoyuan Wang
Wei Luo
Jian Huang
48
0
0
17 Mar 2025
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo
Ziyang Chen
Shaoguang Wang
JianXiang He
Yijie Xu
Jinhui Ye
Ying Sun
Hui Xiong
49
1
0
17 Mar 2025
SPOC: Spatially-Progressing Object State Change Segmentation in Video
SPOC: Spatially-Progressing Object State Change Segmentation in Video
Priyanka Mandikal
Tushar Nagarajan
Alex Stoken
Zihui Xue
Kristen Grauman
44
0
0
15 Mar 2025
Online Language Splatting
Saimouli Katragadda
Cho-Ying Wu
Yuliang Guo
Xinyu Huang
G. Huang
Liu Ren
3DGS
OffRL
60
0
0
12 Mar 2025
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li
Yifan Jiao
Dan Meng
Heng Fan
L. Zhang
60
0
0
11 Mar 2025
SAS: Segment Any 3D Scene with Integrated 2D Priors
Z. Li
Jiahao Lu
Jiacheng Deng
Hanzhi Chang
Lifan Wu
Yanzhe Liang
Tianzhu Zhang
57
0
0
11 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
151
0
0
11 Mar 2025
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
Dennis Rotondi
Fabio Scaparro
Hermann Blum
Kai O. Arras
41
0
0
10 Mar 2025
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
VLM
ObjD
74
1
0
10 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
54
0
0
10 Mar 2025
Handle Object Navigation as Weighted Traveling Repairman Problem
Ruimeng Liu
Xinhang Xu
Shenghai Yuan
Lihua Xie
68
2
0
10 Mar 2025
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow
Evelien Riddell
Yimu Wang
Sean Sedwards
Krzysztof Czarnecki
3DPC
46
0
0
09 Mar 2025
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang
Yongchao Feng
Shuai Yang
Z. Liu
Qingjie Liu
Y. Wang
ObjD
160
0
0
08 Mar 2025
Robust Computer-Vision based Construction Site Detection for Assistive-Technology Applications
Junchi Feng
Giles Hamilton-Fletcher
Nikhil Ballem
Michael Batavia
Yifei Wang
Jiuling Zhong
Maurizio Porfiri
John-Ross Rizzo
53
0
0
06 Mar 2025
Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks
Lukás Gajdosech
Hassan Ali
Jan-Gerrit Habekost
Martin Madaras
Matthias Kerzel
Stefan Wermter
54
0
0
06 Mar 2025
OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding
Dianyi Yang
Yu Gao
Xihan Wang
Yufeng Yue
Yi Yang
M. Fu
3DGS
64
1
0
03 Mar 2025
RTGen: Real-Time Generative Detection Transformer
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
ObjD
VLM
49
0
0
28 Feb 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
J. Liu
Peng Wang
Guoqing Wang
Y. Yang
H. Shen
ObjD
94
0
0
27 Feb 2025
Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models
Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models
Yaqi Sun
Kyohei Atarashi
Koh Takeuchi
Hisashi Kashima
MLLM
49
0
0
24 Feb 2025
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
Yuheng Xue
Nenglun Chen
Jun Liu
Wenyun Sun
3DPC
66
7
0
24 Feb 2025
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
Caixiong Li
Xiongwei Zhao
Jinhang Zhang
Xing Zhang
Qihao Sun
Zhou Wu
ObjD
MLLM
VLM
56
0
0
23 Feb 2025
OpenVox: Real-time Instance-level Open-vocabulary Probabilistic Voxel Representation
Yinan Deng
Bicheng Yao
Yihang Tang
Yi Yang
Yufeng Yue
36
0
0
23 Feb 2025
Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment
Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment
Zeyu Shangguan
Daniel Seita
Mohammad Rostami
ObjD
61
0
0
23 Feb 2025
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
Luzhou Ge
Xiangyu Zhu
Zhuo Yang
Xuesong Li
3DGS
70
0
0
21 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
131
4
0
18 Feb 2025
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Abdalwhab Abdalwhab
A. Imran
Sina Heydarian
I. Iordanova
David St-Onge
49
0
0
16 Jan 2025
1234
Next