ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.17270
  4. Cited By
YOLO-World: Real-Time Open-Vocabulary Object Detection

YOLO-World: Real-Time Open-Vocabulary Object Detection

30 January 2024
Tianheng Cheng
Lin Song
Yixiao Ge
Wenyu Liu
Xinggang Wang
Ying Shan
    VLM
    ObjD
ArXivPDFHTML

Papers citing "YOLO-World: Real-Time Open-Vocabulary Object Detection"

50 / 160 papers shown
Title
From COCO to COCO-FP: A Deep Dive into Background False Positives for
  COCO Detectors
From COCO to COCO-FP: A Deep Dive into Background False Positives for COCO Detectors
Longfei Liu
Wen Guo
S. Huang
Cheng Li
Xi Shen
ObjD
41
0
0
12 Sep 2024
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open
  Detection
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection
Pengfei Qi
Yifei Zhang
Wenqiang Li
Youwen Hu
Kunlong Bai
ObjD
45
0
0
10 Sep 2024
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Qi Yang
Binjie Mao
Zili Wang
Xing Nie
Pengfei Gao
Ying Guo
Cheng Zhen
Pengfei Yan
Shiming Xiang
VGen
DiffM
43
5
0
10 Sep 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Jiaxin Cheng
Zixu Zhao
Tong He
Tianjun Xiao
Yicong Zhou
Zheng Zhang
DiffM
44
0
0
07 Sep 2024
SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation
SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation
Yang Zhang
Rui Zhang
Xuecheng Nie
Haochen Li
Jikun Chen
Yifan Hao
Xin Zhang
Luoqi Liu
Ling Li
43
0
0
02 Sep 2024
EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online
  Grounding and Execution
EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution
F. Argenziano
Michele Brienza
Vincenzo Suriani
Daniele Nardi
D. Bloisi
LM&Ro
46
1
0
30 Aug 2024
OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart
  Wheelchair Navigation
OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation
Muhammad Rameez Ur Rahman
Piero Simonetto
Anna Polato
Francesco Pasti
Luca Tonin
Sebastiano Vascon
3DPC
41
0
0
25 Aug 2024
OVA-Det: Open Vocabulary Aerial Object Detection with Image-Text Collaboration
OVA-Det: Open Vocabulary Aerial Object Detection with Image-Text Collaboration
Guoting Wei
Xia Yuan
Yu Liu
Zhenhao Shang
Kelu Yao
Peng Wang
Qingsen Yan
Chunxia Zhao
Haokui Zhang
Rong Xiao
VLM
ObjD
55
1
0
22 Aug 2024
On the Potential of Open-Vocabulary Models for Object Detection in
  Unusual Street Scenes
On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes
Sadia Ilyas
Ido Freeman
Matthias Rottmann
ObjD
51
3
0
20 Aug 2024
ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal
  End-User Programming
ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming
Jaylin Herskovitz
Andi Xu
Rahaf Alharbi
Anhong Guo
29
2
0
20 Aug 2024
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for
  Short Drama
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama
Jing Tang
Quanlu Jia
Yuqiang Xie
Zeyu Gong
Xiang Wen
Jiayi Zhang
Yalong Guo
Guibin Chen
Jiangping Yang
VGen
30
1
0
18 Aug 2024
YOLOv1 to YOLOv10: The fastest and most accurate real-time object
  detection systems
YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems
Chien-Yao Wang
Hong-Yuan Mark Liao
ObjD
41
33
0
18 Aug 2024
WorldScribe: Towards Context-Aware Live Visual Descriptions
WorldScribe: Towards Context-Aware Live Visual Descriptions
Ruei-Che Chang
Yuxuan Liu
Anhong Guo
56
14
0
13 Aug 2024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware
  Open-domain Visual Storytelling
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Zilyu Ye
Jinxiu Liu
Ruotian Peng
Jinjin Cao
Zhiyang Chen
...
Mingyuan Zhou
Xiaoqian Shen
Mohamed Elhoseiny
Qi Liu
Guo-Jun Qi
VGen
VLM
37
1
0
07 Aug 2024
CerberusDet: Unified Multi-Task Object Detection
CerberusDet: Unified Multi-Task Object Detection
Irina Tolstykh
Mikhail Chernyshov
Maksim Kuprashevich
VLM
ObjD
56
0
0
17 Jul 2024
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Hao Ding
Tuxun Lu
Yuqian Zhang
Ruixing Liang
Hongchao Shu
...
Bo Wang
Marcos Fernández-Rodríguez
Estevao Lima
João L. Vilaça
Mathias Unberath
63
4
0
16 Jul 2024
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer
  from Text to Image via CLIP Inversion
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer
Kyra Ahrens
Stefan Wermter
CLIP
VLM
27
3
0
15 Jul 2024
An Autonomous Drone Swarm for Detecting and Tracking Anomalies among
  Dense Vegetation
An Autonomous Drone Swarm for Detecting and Tracking Anomalies among Dense Vegetation
Rakesh John Amala Arokia Nathan
Sigrid Strand
Daniel Mehrwald
Dmitriy Shutin
Oliver Bimber
33
0
0
15 Jul 2024
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Yu Wang
Xiangbo Su
Qiang Chen
Xinyu Zhang
Teng Xi
Kun Yao
Errui Ding
Gang Zhang
Jingdong Wang
ObjD
VLM
47
1
0
15 Jul 2024
Sensorimotor Attention and Language-based Regressions in Shared Latent
  Variables for Integrating Robot Motion Learning and LLM
Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM
Kanata Suzuki
Tetsuya Ogata
37
2
0
12 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
66
4
0
09 Jul 2024
A Physical Model-Guided Framework for Underwater Image Enhancement and
  Depth Estimation
A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation
Dazhao Du
Enhan Li
Hui Xiong
Fanjiang Xu
Jianwei Niu
Gang Hua
43
3
0
05 Jul 2024
Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Robotic Manipulation
Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Robotic Manipulation
Hang Li
Qian Feng
Zhi Zheng
Jianxiang Feng
Zhaopeng Chen
Alois Knoll
26
1
0
29 Jun 2024
SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for
  Multi-Robot Navigation
SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation
Xu Liu
Jiuzhou Lei
Ankit Prabhu
Yuezhan Tao
Igor Spasojevic
Pratik Chaudhari
Nikolay Atanasov
Vijay Kumar
46
7
0
25 Jun 2024
CogExplore: Contextual Exploration with Language-Encoded Environment
  Representations
CogExplore: Contextual Exploration with Language-Encoded Environment Representations
Harel Biggie
Patrick Cooper
Doncey Albin
Kristen Such
Christoffer Heckman
LM&Ro
35
0
0
24 Jun 2024
GATSBI: An Online GTSP-Based Algorithm for Targeted Surface Bridge
  Inspection and Defect Detection
GATSBI: An Online GTSP-Based Algorithm for Targeted Surface Bridge Inspection and Defect Detection
Harnaik Dhami
Charith Reddy
V. Sharma
Troi Williams
Pratap Tokekar
44
1
0
24 Jun 2024
SpatialBot: Precise Spatial Understanding with Vision Language Models
SpatialBot: Precise Spatial Understanding with Vision Language Models
Wenxiao Cai
Yaroslav Ponomarenko
Jianhao Yuan
Xiaoqi Li
Wankou Yang
Hao Dong
Bo-Lu Zhao
VLM
56
28
0
19 Jun 2024
Enhanced Object Detection: A Study on Vast Vocabulary Object Detection
  Track for V3Det Challenge 2024
Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024
Peixi Wu
Bosong Chai
Xuan Nie
Longquan Yan
Zeyu Wang
Qifan Zhou
Boning Wang
Yansong Peng
Hebei Li
ObjD
31
1
0
13 Jun 2024
Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure
Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure
Minghao Yang
Linlin Gao
Pengyuan Li
Wenbo Li
Yihong Dong
Zhiying Cui
34
1
0
06 Jun 2024
GrootVL: Tree Topology is All You Need in State Space Model
GrootVL: Tree Topology is All You Need in State Space Model
Yicheng Xiao
Lin Song
Shaoli Huang
Jiangshan Wang
Siyu Song
Yixiao Ge
Xiu Li
Ying Shan
Mamba
44
10
0
04 Jun 2024
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed El Amine Boudjoghra
Angela Dai
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
F. Khan
VLM
ISeg
83
6
0
04 Jun 2024
ELSA: Evaluating Localization of Social Activities in Urban Streets
ELSA: Evaluating Localization of Social Activities in Urban Streets
Maryam Hosseini
Marco Cipriano
Sedigheh Eslami
Daniel Hodczak
Liu Liu
Andres Sevtsuk
Gerard de Melo
41
0
0
03 Jun 2024
It's a Feature, Not a Bug: Measuring Creative Fluidity in Image
  Generators
It's a Feature, Not a Bug: Measuring Creative Fluidity in Image Generators
Aditi Ramaswamy
Melane Navaratnarajah
Hana Chockler
EGVM
42
0
0
03 Jun 2024
REvolve: Reward Evolution with Large Language Models using Human Feedback
REvolve: Reward Evolution with Large Language Models using Human Feedback
Rishi Hazra
Alkis Sygkounas
A. Persson
Amy Loutfi
Pedro Zuidberg Dos Martires
38
1
0
03 Jun 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
58
4
0
28 May 2024
GameVLM: A Decision-making Framework for Robotic Task Planning Based on
  Visual Language Models and Zero-sum Games
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games
Aoran Mei
Jianhua Wang
Guo-Niu Zhu
Zhongxue Gan
42
6
0
22 May 2024
Class-Conditional self-reward mechanism for improved Text-to-Image
  models
Class-Conditional self-reward mechanism for improved Text-to-Image models
Safouane El Ghazouali
Arnaud Gucciardi
Umberto Michelucci
EGVM
29
0
0
22 May 2024
Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance
  for Low-Light Image Enhancement
Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement
Igor Morawski
Kai He
Shusil Dangi
Winston H. Hsu
VLM
54
2
0
19 May 2024
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Tianhe Ren
Qing Jiang
Shilong Liu
Zhaoyang Zeng
Wenlong Liu
...
Hao Zhang
Feng Li
Peijun Tang
Kent Yu
Lei Zhang
ObjD
VLM
42
34
0
16 May 2024
Towards Consistent Object Detection via LiDAR-Camera Synergy
Towards Consistent Object Detection via LiDAR-Camera Synergy
Kai Luo
Hao Wu
Kefu Yi
Kailun Yang
Wei Hao
Rongdong Hu
43
1
0
02 May 2024
The 8th AI City Challenge
The 8th AI City Challenge
Shuo Wang
D. Anastasiu
Zhenghang Tang
Ming-Ching Chang
Yue Yao
...
Xunlei Wu
S. Pusegaonkar
Yizhou Wang
Sujit Biswas
Rama Chellappa
38
31
0
15 Apr 2024
Is CLIP the main roadblock for fine-grained open-world perception?
Is CLIP the main roadblock for fine-grained open-world perception?
Lorenzo Bianchi
F. Carrara
Nicola Messina
Fabrizio Falchi
VLM
40
4
0
04 Apr 2024
Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and
  Action Recognition in Drone Imagery
Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery
Christian Limberg
Artur Gonçalves
Bastien Rigault
Helmut Prendinger
32
5
0
02 Apr 2024
Open-Vocabulary Object Detectors: Robustness Challenges under
  Distribution Shifts
Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts
Prakash Chandra Chhipa
Kanjar De
Meenakshi Subhash Chippa
Rajkumar Saini
Marcus Liwicki
ObjD
VLM
36
1
0
01 Apr 2024
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Zeyu Shangguan
Daniel Seita
Mohammad Rostami
ObjD
52
1
0
24 Mar 2024
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual
  Navigation
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation
Hao Wang
Jiayou Qin
Ashish Bastola
Xiwen Chen
John Suchanek
Zihao Gong
Abolfazl Razi
40
15
0
19 Mar 2024
Real-time Transformer-based Open-Vocabulary Detection with Efficient
  Fusion Head
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
Tiancheng Zhao
Peng Liu
Xuan He
Lu Zhang
Kyusong Lee
ObjD
43
8
0
11 Mar 2024
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated
  Student-Teacher Learning
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
Yan Li
Weiwei Guo
Xue Yang
Ning Liao
Dunyun He
Jiaqi Zhou
Wenxian Yu
ObjD
VLM
32
7
0
20 Nov 2023
Incremental Object-Based Novelty Detection with Feedback Loop
Incremental Object-Based Novelty Detection with Feedback Loop
Simone Caldarella
Elisa Ricci
Rahaf Aljundi
39
0
0
15 Nov 2023
Incremental Object Detection with CLIP
Incremental Object Detection with CLIP
Ziyue Huang
Yupeng He
Qingjie Liu
Yunhong Wang
CLL
ObjD
VLM
26
2
0
13 Oct 2023
Previous
1234
Next