Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
v1
v2
v3
v4 (latest)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (8136★)
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 690 papers shown
Title
3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
Xiaobiao Du
Yida Wang
Shuyun Wang
Zhuojie Wu
Hongwei Sheng
...
Ming Lu
Tianqing Zhu
Tianqing Zhu
Kun Zhan
Xin Yu
3DPC
98
7
0
01 Jul 2025
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval
Hong-xia Gao
Yiming Bao
Xuezhan Tu
Bin Zhong
Minling Zhang
102
0
0
01 Jul 2025
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models
Puhao Li
Yingying Wu
Ziheng Xi
Wanlin Li
Yuzhe Huang
...
Yinghan Chen
Jianan Wang
Song-Chun Zhu
Tengyu Liu
Siyuan Huang
LM&Ro
29
0
0
19 Jun 2025
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
Guang Yin
Yitong Li
Yixuan Wang
D. Mcconachie
Paarth Shah
Kunimatsu Hashimoto
Huan Zhang
Katherine Liu
Yunzhu Li
LM&Ro
17
0
0
19 Jun 2025
History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation
Mobin Habibpour
Fatemeh Afghah
LM&Ro
31
0
0
19 Jun 2025
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
Fenghua Cheng
Jinxiang Wang
Sen Wang
Zi Huang
Xue Li
LRM
38
0
0
19 Jun 2025
KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping
Kowndinya Boyalakuntla
Abdeslam Boularias
Jingjin Yu
19
0
0
19 Jun 2025
MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System
Miaoxin Pan
Jinnan Li
Yaowen Zhang
Yi Yang
Yufeng Yue
22
0
0
18 Jun 2025
GRIM: Task-Oriented Grasping with Conditioning on Generative Examples
Shailesh
Alok Raj
Nayan Kumar
Priya Shukla
Andrew Melnik
Micheal Beetz
G. C. Nandi
49
0
0
18 Jun 2025
BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion
Yuqing Lan
Chenyang Zhu
Zhirui Gao
JIazhao Zhang
Yihan Cao
Renjiao Yi
Yijie Wang
Kai Xu
3DPC
41
0
0
18 Jun 2025
Open-World Object Counting in Videos
Niki Amini-Naieni
Andrew Zisserman
30
0
0
18 Jun 2025
Unified Representation Space for 3D Visual Grounding
Yinuo Zheng
Lipeng Gu
Honghua Chen
Liangliang Nan
Mingqiang Wei
23
0
0
17 Jun 2025
Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models
Huihan Liu
Rutav Shah
Shuijing Liu
Jack Pittenger
Mingyo Seo
Yuchen Cui
Yonatan Bisk
Roberto Martín-Martín
Yuke Zhu
40
0
0
17 Jun 2025
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Guohuan Xie
Syed Ariff Syed Hesham
Wenya Guo
Bing Li
Ming-Ming Cheng
Guolei Sun
Yun-Hai Liu
39
0
0
16 Jun 2025
Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention
Jeonghoon Park
Juyoung Lee
Chaeyeon Chung
Jaeseong Lee
Jaegul Choo
Jindong Gu
28
0
0
16 Jun 2025
Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation
Utkarsh Bajpai
Julius Ruckin
Cyrill Stachniss
Marija Popović
32
0
0
16 Jun 2025
Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing
Zhuoying Li
Zhu Xu
Yuxin Peng
Yang Liu
18
0
0
15 Jun 2025
Retrieval Augmented Comic Image Generation
Yunhao Shui
Xuekuan Wang
Feng Qiu
Yuqiu Huang
Jinzhu Li
...
Jinru Han
Zhuo Zeng
Pengpeng Zhang
Jiarui Han
K. Sun
42
0
0
14 Jun 2025
Benchmarking Image Similarity Metrics for Novel View Synthesis Applications
Charith Wickrema
Sara Leary
Shivangi Sarkar
Mark Giglio
Eric Bianchi
Eliza Mace
Michael Twardowski
23
0
0
14 Jun 2025
DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers
Lizhen Wang
Zhurong Xia
T. Hu
P. Wang
Pengfei Wang
Zerong Zheng
Ming Zhou
DiffM
VGen
130
0
0
12 Jun 2025
SAGE: Exploring the Boundaries of Unsafe Concept Domain with Semantic-Augment Erasing
Hongguang Zhu
Y. X. Wei
Mengyu Wang
Siyu Jiao
Yan Fang
Jiannan Huang
Yao Zhao
66
0
0
11 Jun 2025
HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
Ziyao Huang
Zixiang Zhou
Juan Cao
Yifeng Ma
Yi Chen
...
Hongmei Wang
Qin Lin
Yuan Zhou
Qinglin Lu
Fan Tang
VGen
43
0
0
10 Jun 2025
Open World Scene Graph Generation using Vision Language Models
Amartya Dutta
Kazi Sajeed Mehrab
Medha Sawhney
Abhilash Neog
Mridul Khurana
...
Aanish Pradhan
M. Maruf
Ismini Lourentzou
Arka Daw
Anuj Karpatne
VLM
32
0
0
09 Jun 2025
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Teng Hu
Zhentao Yu
Zhengguang Zhou
Jiangning Zhang
Yuan Zhou
Qinglin Lu
Ran Yi
VGen
26
0
0
09 Jun 2025
Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods
Beining Xu
Junxian Li
19
0
0
09 Jun 2025
Dreamland: Controllable World Creation with Simulator and Generative Models
Sicheng Mo
Ziyang Leng
Leon Liu
Weizhen Wang
Honglin He
Bolei Zhou
VGen
16
0
0
09 Jun 2025
OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting
Jens Piekenbrinck
Christian Schmidt
Alexander Hermans
Narunas Vaskevicius
Timm Linder
Bastian Leibe
3DGS
VLM
19
0
0
09 Jun 2025
SAM2Auto: Auto Annotation Using FLASH
Arash Rocky
Q.M. Jonathan Wu
VGen
VLM
38
0
0
09 Jun 2025
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation
Chao Yin
Hao Li
Kequan Yang
Jide Li
Pinpin Zhu
Xiaoqiang Li
24
0
0
07 Jun 2025
IRS: Instance-Level 3D Scene Graphs via Room Prior Guided LiDAR-Camera Fusion
Hongming Chen
Yiyang Lin
Ziliang Li
Biyu Ye
Y. Zhang
Ximin Lyu
3DV
21
0
0
07 Jun 2025
Advancement and Field Evaluation of a Dual-arm Apple Harvesting Robot
Keyi Zhu
Kyle Lammers
Kaixiang Zhang
Chaaran Arunachalam
Siddhartha Bhattacharya
Jiajia Li
R. Lu
Zhaojian Li
52
0
0
06 Jun 2025
Splat and Replace: 3D Reconstruction with Repetitive Elements
Nicolás Violante
Andréas Meuleman
Alban Gauthier
F. Durand
Thibault Groueix
G. Drettakis
3DGS
42
0
0
06 Jun 2025
Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection
Yu Li
Xingyu Qiu
Yuqian Fu
Jie Chen
Tianwen Qian
...
Danda Pani Paudel
Yanwei Fu
Xuanjing Huang
Luc Van Gool
Yu-Gang Jiang
72
0
0
06 Jun 2025
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
V. Bhat
Naman Patel
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
19
0
0
06 Jun 2025
You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping
Jingshun Huang
Haitao Lin
Tianyu Wang
Yanwei Fu
Yu Jiang
Xiangyang Xue
52
0
0
06 Jun 2025
Textile Analysis for Recycling Automation using Transfer Learning and Zero-Shot Foundation Models
Yannis Spyridis
Vasileios Argyriou
22
0
0
06 Jun 2025
AssetDropper: Asset Extraction via Diffusion Models with Reward-Driven Optimization
Lanjiong Li
Guanhua Zhao
Lingting Zhu
Zeyu Cai
Lequan Yu
Jian Zhang
Zeyu Wang
26
0
0
06 Jun 2025
ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation
Zhuoyun Zhong
Seyedali Golestaneh
Constantinos Chamzas
152
0
0
05 Jun 2025
Handle-based Mesh Deformation Guided By Vision Language Model
Xingpeng Sun
Shiyang Jia
Zherong Pan
Kui Wu
Aniket Bera
89
0
0
05 Jun 2025
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction
Zesheng Ye
C. Cai
Ruijiang Dong
Jianzhong Qi
Lei Feng
Pin-Yu Chen
Feng Liu
234
0
0
05 Jun 2025
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Quan Shi
Carlos E. Jimenez
Shunyu Yao
Nick Haber
Diyi Yang
Karthik Narasimhan
49
0
0
05 Jun 2025
Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding
Yani Zhang
Dongming Wu
Hao Shi
Yingfei Liu
Tiancai Wang
Haoqiang Fan
Xingping Dong
ObjD
120
0
0
05 Jun 2025
Understanding Physical Properties of Unseen Deformable Objects by Leveraging Large Language Models and Robot Actions
Changmin Park
Beomjoon Lee
Haechan Jung
Haejin Jung
Changjoo Nam
LM&Ro
114
0
0
04 Jun 2025
Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision
Tomoya Yoshida
Shuhei Kurita
Taichi Nishimura
Shinsuke Mori
83
0
0
04 Jun 2025
Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models
Fangrui Zhu
Hanhui Wang
Yiming Xie
Jing Gu
Tianye Ding
Jianwei Yang
Huaizu Jiang
3DV
LRM
116
0
0
04 Jun 2025
Object-level Self-Distillation for Vision Pretraining
Çağlar Hızlı
Çağatay Yıldız
Pekka Marttinen
OCL
VLM
52
0
0
04 Jun 2025
SemNav: A Model-Based Planner for Zero-Shot Object Goal Navigation Using Vision-Foundation Models
Arnab Debnath
Gregory J. Stein
Jana Kosecka
LM&Ro
96
0
0
04 Jun 2025
Sign Language: Towards Sign Understanding for Robot Autonomy
Ayush Agrawal
Joel Loo
Nicky Zimmerman
David Hsu
SLR
89
0
0
03 Jun 2025
Auto-Labeling Data for Object Detection
Brent A. Griffin
Manushree Gangwar
Jacob Sela
Jason J. Corso
ObjD
VLM
76
0
0
03 Jun 2025
SAVOR: Skill Affordance Learning from Visuo-Haptic Perception for Robot-Assisted Bite Acquisition
Zhanxin Wu
Bo Ai
Tom Silver
Tapomayukh Bhattacharjee
47
1
0
03 Jun 2025
1
2
3
4
...
12
13
14
Next