Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
v1
v2
v3
v4 (latest)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (8136★)
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 690 papers shown
Title
SingaKids: A Multilingual Multimodal Dialogic Tutor for Language Learning
Zhengyuan Liu
Geyu Lin
Hui Li Tan
Huayun Zhang
Yanfeng Lu
...
Stella Xin Yin
He Sun
Hock Huan Goh
Lung Hsiang Wong
Nancy F. Chen
49
0
0
03 Jun 2025
Sign Language: Towards Sign Understanding for Robot Autonomy
Ayush Agrawal
Joel Loo
Nicky Zimmerman
David Hsu
SLR
89
0
0
03 Jun 2025
SAVOR: Skill Affordance Learning from Visuo-Haptic Perception for Robot-Assisted Bite Acquisition
Zhanxin Wu
Bo Ai
Tom Silver
Tapomayukh Bhattacharjee
47
1
0
03 Jun 2025
Towards Auto-Annotation from Annotation Guidelines: A Benchmark through 3D LiDAR Detection
Yechi Ma
Wei Hua
Shu Kong
68
0
0
03 Jun 2025
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
Jiajun Jiang
Yiming Zhu
Zirui Wu
Jie Song
79
0
0
02 Jun 2025
No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond
Tomasz Stanczyk
Seongro Yoon
François Brémond
77
0
0
02 Jun 2025
Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
Taehoon Yoon
Yunhong Min
Kyeongmin Yeo
Minhyuk Sung
98
0
0
02 Jun 2025
Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360-Degree Firefighting Videos
Aditi Tiwari
Farzaneh Masoud
Dac Trong Nguyen
Jill Kraft
Heng Ji
Klara Nahrstedt
45
0
0
02 Jun 2025
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
Yuyuan Liu
Yuanhong Chen
Chong Wang
Junlin Han
Junde Wu
Can Peng
Jingkun Chen
Yu Tian
Gustavo Carneiro
VLM
56
0
0
01 Jun 2025
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
Sara Ghazanfari
Francesco Croce
Nicolas Flammarion
Prashanth Krishnamurthy
Farshad Khorrami
S. Garg
LRM
35
0
0
31 May 2025
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
Zeqi Gu
Yin Cui
Zhaoshuo Li
Fangyin Wei
Yunhao Ge
Jinwei Gu
Ming-Yu Liu
Abe Davis
Yifan Ding
36
0
0
31 May 2025
Common Inpainted Objects In-N-Out of Context
Tianze Yang
Tyson Jordan
Ninghao Liu
Jin Sun
43
0
0
31 May 2025
iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection
Huahui Yi
Wei Xu
Ziyuan Qin
Xi Chen
Xiaohu Wu
Kang Li
Qicheng Lao
VLM
40
0
0
31 May 2025
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization
Cailin Zhuang
Ailin Huang
Wei Cheng
J. Wu
Yaoqi Hu
...
Hengyuan Xu
Xuanyang Zhang
Xianfang Zeng
Gang Yu
Fangqiu Yi
CoGe
70
2
0
30 May 2025
SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping
Mingxu Zhang
Xiaoqi Li
Jiahui Xu
Kaichen Zhou
Hojin Bae
Yan Shen
Chuyan Xiong
Jiaming Liu
73
0
0
30 May 2025
GenSpace: Benchmarking Spatially-Aware Image Generation
Zehan Wang
Jiayang Xu
Ziang Zhang
Tianyu Pan
Chao Du
Hengshuang Zhao
Zhou Zhao
EGVM
57
0
0
30 May 2025
Mobi-
π
π
π
: Mobilizing Your Robot Learning Policy
Jingyun Yang
Isabella Huang
Brandon Vu
Max Bajracharya
Rika Antonova
Jeannette Bohg
49
0
0
29 May 2025
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Chenbin Pan
Wenbin He
Zhengzhong Tu
Liu Ren
LRM
VLM
82
0
0
29 May 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man
De-An Huang
Guilin Liu
Shiwei Sheng
Shilong Liu
Liang-Yan Gui
Jan Kautz
Yu Wang
Zhiding Yu
MLLM
LRM
80
0
0
29 May 2025
RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer
Liu Liu
Xiaofeng Wang
Guosheng Zhao
Keyu Li
Wenkang Qin
Jiaxiong Qiu
Zheng Hua Zhu
Guan Huang
Zhizhong Su
VGen
86
1
0
29 May 2025
VModA: An Effective Framework for Adaptive NSFW Image Moderation
Han Bao
Qinying Wang
Zhi Chen
Qingming Li
Xuhong Zhang
Changjiang Li
Zonghui Wang
Shouling Ji
Wenzhi Chen
45
0
0
29 May 2025
Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration
Zeying Gong
Rong Li
Tianshuai Hu
Ronghe Qiu
Lingdong Kong
Lingfeng Zhang
Yiyi Ding
Leying Zhang
Junwei Liang
70
0
0
29 May 2025
MAGREF: Masked Guidance for Any-Reference Video Generation
Yufan Deng
Xun Guo
Yuanyang Yin
Jacob Zhiyuan Fang
Yiding Yang
...
Shenghai Yuan
Angtian Wang
Bo Liu
Haibin Huang
Chongyang Ma
DiffM
VGen
VOS
86
1
0
29 May 2025
Learning Compositional Behaviors from Demonstration and Language
Weiyu Liu
Neil Nie
Ruohan Zhang
Jiayuan Mao
Jiajun Wu
LM&Ro
70
6
0
28 May 2025
Improving Contrastive Learning for Referring Expression Counting
Kostas Triaridis
Panagiotis Kaliosis
E-Ro Nguyen
Aoxiang Fan
Hieu M. Le
Dimitris Samaras
SSL
67
0
0
28 May 2025
Cross-DINO: Cross the Deep MLP and Transformer for Small Object Detection
Guiping Cao
Wenjian Huang
X. Lan
Jianguo Zhang
D. Jiang
Yaowei Wang
ViT
49
0
0
28 May 2025
UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments
W. Zheng
L. Ou
Jiajie He
Libo Zhou
Xinyi Yu
Yan Wei
3DGS
56
0
0
28 May 2025
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation
Pardis Taghavi
Tian Liu
Renjie Li
Reza Langari
Zhengzhong Tu
ISeg
89
0
0
28 May 2025
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
Eric Xing
Pranavi Kolouju
Robert Pless
Abby Stylianou
Nathan Jacobs
28
0
0
27 May 2025
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Shurong Zheng
Fan Yang
Ming Tang
Jinqiao Wang
VLM
LRM
66
0
0
27 May 2025
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Peter Robicheaux
Matvei Popov
Anish Madan
Isaac Robinson
Joseph Nelson
Deva Ramanan
Neehar Peri
ObjD
VLM
111
3
0
27 May 2025
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Muzhi Zhu
Hao Zhong
Canyu Zhao
Zongze Du
Zheng Huang
...
Hao Chen
Cheng Zou
Jingdong Chen
Ming-Hsuan Yang
Chunhua Shen
LRM
178
0
0
27 May 2025
ISAC: Training-Free Instance-to-Semantic Attention Control for Improving Multi-Instance Generation
Sanghyun Jo
Wooyeol Lee
Ziseok Lee
Kyungsu Kim
803
0
0
27 May 2025
RefAV: Towards Planning-Centric Scenario Mining
Cainan Davidson
Deva Ramanan
Neehar Peri
89
2
0
27 May 2025
Open-Det: An Efficient Learning Framework for Open-Ended Detection
Guiping Cao
Tao Wang
Wenjian Huang
X. Lan
Jianguo Zhang
D. Jiang
ObjD
VLM
28
0
0
27 May 2025
CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features
X. Feng
D. Zhang
Shuyan Hu
X. Li
M. Wu
Jie Zhang
Xiaosha Chen
Kexin Huang
64
0
0
26 May 2025
Electrolyzers-HSI: Close-Range Multi-Scene Hyperspectral Imaging Benchmark Dataset
Elias Arbash
Ahmed J. Afifi
Ymane Belahsen
Margret Fuchs
Pedram Ghamisi
P. Scheunders
R. Gloaguen
41
0
0
26 May 2025
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
Ruiqi Wu
Xinjie Wang
Liu Liu
Chunle Guo
Jiaxiong Qiu
Chongyi Li
Lichao Huang
Zhizhong Su
Ming-Ming Cheng
VGen
98
1
0
26 May 2025
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models
Nanxing Hu
Xiaoyue Duan
Jinchao Zhang
Guoliang Kang
MLLM
76
0
0
26 May 2025
LlamaSeg: Image Segmentation via Autoregressive Mask Generation
Jiru Deng
Tengjin Weng
Tianyu Yang
Wenhan Luo
Zhiheng Li
Wenhao Jiang
VLM
154
0
0
26 May 2025
FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance Fields
Lukas Meyer
Andrei-Timotei Ardelean
Tim Weyrich
Marc Stamminger
49
0
0
26 May 2025
From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
Zuyao Chen
Jinlin Wu
Zhen Lei
Chang Wen Chen
53
0
0
26 May 2025
In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation
Yu Xu
Fan Tang
You Wu
Lin Gao
Oliver Deussen
Hongbin Yan
Jintao Li
Juan Cao
Tong-Yee Lee
DiffM
54
0
0
26 May 2025
How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
Yining Pan
Qiongjie Cui
Xulei Yang
Na Zhao
61
0
0
25 May 2025
CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
Yongheng Zhang
Xu Liu
Ruoxi Zhou
Qiguang Chen
Hao Fei
Wenpeng Lu
L. Qin
HILM
LRM
41
0
0
25 May 2025
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance
Mohammad Mahdi Moradi
Sudhir Mudur
102
0
0
25 May 2025
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
H. Zhang
Dexiang Hong
Maoke Yang
Yutao Chen
Zhao Zhang
Jie Shao
Xinglong Wu
Zuxuan Wu
Yu Jiang
DiffM
AI4CE
191
0
0
25 May 2025
SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes
Dicong Qiu
Jiadi You
Zeying Gong
Ronghe Qiu
Hui Xiong
Junwei Liang
39
0
0
24 May 2025
CU-Multi: A Dataset for Multi-Robot Data Association
Doncey Albin
Miles Mena
Annika Thomas
Harel Biggie
Xuefei Sun
Dusty Woods
Steve McGuire
Christoffer Heckman
81
0
0
23 May 2025
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
Zifu Wan
Yaqi Xie
Ce Zhang
Zhiqiu Lin
Zihan Wang
Simon Stepputtis
Deva Ramanan
Katia Sycara
34
0
0
23 May 2025
Previous
1
2
3
4
5
...
12
13
14
Next