Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
v1
v2
v3
v4 (latest)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (8136★)
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 691 papers shown
Title
EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing
Hongxiang Jiang
Jihao Yin
Qixiong Wang
Jiaqi Feng
Guo Chen
103
1
0
30 Mar 2025
Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts
Jianhua Sun
Jiude Wei
Yongqian Li
Cewu Lu
LM&Ro
100
1
0
30 Mar 2025
Object Isolated Attention for Consistent Story Visualization
Xiangyang Luo
Junhao Cheng
Yifan Xie
Xin Zhang
Tao Feng
Ziqiang Liu
Fei Ma
Fei Richard Yu
DiffM
110
6
0
30 Mar 2025
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva
Andrew Zisserman
110
1
0
30 Mar 2025
Efficient Adaptation For Remote Sensing Visual Grounding
Hasan Moughnieh
Mohamad Chalhoub
Hasan Nasrallah
Cristiano Nattero
Paolo Campanella
Giovanni Nico
A. Ghandour
123
0
0
29 Mar 2025
NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving
Fuhao Li
Huan Jin
Bin-Bin Gao
Liaoyuan Fan
Lihui Jiang
Long Zeng
139
2
0
28 Mar 2025
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Yunhong Min
Daehyeon Choi
Kyeongmin Yeo
Jihyun Lee
Minhyuk Sung
116
0
0
28 Mar 2025
Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges
Ukcheol Shin
Jinsun Park
3DV
MDE
83
0
0
28 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Yiming Lei
Chenkai Zhang
Zeming Liu
Qingjie Liu
Yansen Wang
155
2
0
28 Mar 2025
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao
Albert Zhai
Shenlong Wang
VGen
149
2
0
27 Mar 2025
Cultivating Game Sense for Yourself: Making VLMs Gaming Experts
Wenxuan Lu
Jiangyang He
Zhanqiu Zhang
Yiwen Guo
Tianning Zang
81
0
0
27 Mar 2025
GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Xingyu Peng
Si Liu
Chen Gao
Yan Bai
Beipeng Mu
Xiaofei Wang
Huaxia Xia
123
0
0
26 Mar 2025
Robust Flower Cluster Matching Using The Unscented Transform
Andy Chu
Rashik Shrestha
Yu Gu
Jason N. Gross
90
0
0
26 Mar 2025
VideoGEM: Training-free Action Grounding in Videos
Felix Vogel
Walid Bousselham
Anna Kukleva
Nina Shvetsova
Hilde Kuehne
LM&Ro
VLM
158
0
0
26 Mar 2025
LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
Yejin Kwon
Daeun Moon
Youngje Oh
Hyunsoo Yoon
155
0
0
26 Mar 2025
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang
Jinghao Li
Yu-Wing Tai
DiffM
173
2
0
25 Mar 2025
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim
Taehoon Yoon
Jisung Hwang
Minhyuk Sung
DiffM
179
3
0
25 Mar 2025
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Hao Guo
Jianfei Zhu
Wei Fan
Chunzhi Yi
Feng Jiang
ObjD
97
0
0
25 Mar 2025
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Sangwon Beak
Hyeonwoo Kim
Hanbyul Joo
108
0
0
25 Mar 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
175
1
0
25 Mar 2025
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay Kulkarni
Ge Yan
Chung-En Sun
Tuomas P. Oikarinen
Tsui-Wei Weng
77
0
0
25 Mar 2025
Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
Jingyu Liu
Zijie Xin
Yuhan Fu
Ruixiang Zhao
Bangxiang Lan
Xirong Li
66
0
0
25 Mar 2025
CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection
Zhichao Sun
Huazhang Hu
Yidong Ma
Gang Liu
Nemo Chen
Xu Tang
Feng-Long Xie
Yongchao Xu
ObjD
129
0
0
24 Mar 2025
MaSS13K: A Matting-level Semantic Segmentation Benchmark
C. Xie
Minghan Li
Hui Zeng
Jun Luo
Lei Zhang
VLM
176
0
0
24 Mar 2025
OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning
Hui Li
Congcong Bian
Zeyang Zhang
Xiaoning Song
Xi Li
Xiao Wu
88
0
0
24 Mar 2025
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang
Alexandros Delitzas
Fangjinhua Wang
Ruida Zhang
Xiangyang Ji
Marc Pollefeys
Francis Engelmann
3DV
3DPC
134
4
0
24 Mar 2025
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
Bin Li
Dehong Gao
Yeyuan Wang
Linbo Jin
Shanqing Yu
Xiaoyan Cai
Libin Yang
VLM
93
0
0
24 Mar 2025
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
VLM
123
19
0
23 Mar 2025
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
Oucheng Huang
Yuhang Ma
Zeng Zhao
Mingrui Wu
Jiayi Ji
Rongsheng Zhang
Zhibo Hu
Xiaoshuai Sun
Rongrong Ji
83
1
0
22 Mar 2025
Enhancing Martian Terrain Recognition with Deep Constrained Clustering
Tejas Panambur
M. Parente
78
0
0
22 Mar 2025
MagicColor: Multi-Instance Sketch Colorization
Yize Zhang
Yue Ma
Bingyuan Wang
Qifeng Chen
Zeyu Wang
DiffM
136
4
0
21 Mar 2025
Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting
Simona Kocour
Assia Benbihi
Aikaterini Adam
Torsten Sattler
3DPC
93
0
0
21 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
85
3
0
20 Mar 2025
Single Image Iterative Subject-driven Generation and Editing
Yair Shpitzer
Gal Chechik
Idan Schwartz
93
0
0
20 Mar 2025
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li
Mi Zhang
Yiming Sun
Min Yang
DiffM
89
2
0
19 Mar 2025
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark
Ying Liu
Yijing Hua
Haojiang Chai
Yanbo Wang
TengQi Ye
ObjD
103
0
0
19 Mar 2025
A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees
Faseeh Ahmad
Hashim Ismail
Jonathan Styrud
Maj Stenmark
Volker Krueger
84
1
0
19 Mar 2025
xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion
Saad Lahlali
Sandra Kara
Hejer Ammar
Florian Chabot
Nicolas Granger
Hervé Le Borgne
Q. C. Pham
3DPC
107
0
0
19 Mar 2025
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback
Sungjae Lee
Yeonjoo Hong
Kwang In KIm
85
0
0
19 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
113
0
0
19 Mar 2025
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Nvidia
Hassan Abu Alhaija
Jose M. Alvarez
Maciej Bala
Tiffany Cai
...
Yuchong Ye
Xiaodong Yang
Boxin Wang
Fangyin Wei
Yu Zeng
VGen
179
8
0
18 Mar 2025
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard
Yifei Dong
Fengyi Wu
Qi He
Heng Li
Minghan Li
...
Yuxuan Zhou
Jingdong Sun
Qi Dai
Zhi-Qi Cheng
Alexander G. Hauptmann
LM&Ro
87
0
0
18 Mar 2025
LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation
Yang Zhou
Shiyu Zhao
Yuxiao Chen
Zhenting Wang
Can Jin
Dimitris N. Metaxas
ObjD
171
0
0
18 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu Cheng
VLM
185
1
0
17 Mar 2025
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode
Junjia Huang
Pengxiang Yan
Jinhang Cai
Jiyang Liu
Zhao Wang
Yitong Wang
Xinglong Wu
Guanbin Li
DiffM
93
0
0
17 Mar 2025
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
Zilun Zhang
Haozhan Shen
Tiancheng Zhao
Bin Chen
Zian Guan
Yuhao Wang
Xu Jia
Yuhao Wang
Yongheng Shang
Yuxiang Cai
90
0
0
16 Mar 2025
Exploring Contextual Attribute Density in Referring Expression Counting
Zhicheng Wang
Zhiyu Pan
Zhan Peng
Jian Cheng
Liwen Xiao
Wei Jiang
Zhiguo Cao
76
0
0
16 Mar 2025
VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility
Yitian Shi
Di Wen
Guanqi Chen
Edgar Welte
Sheng Liu
Kunyu Peng
Rainer Stiefelhagen
Rania Rayyes
118
3
0
16 Mar 2025
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
Yuheng Jiang
Zhehao Shen
Chengcheng Guo
Yu Hong
Zhuo Su
Yize Zhang
Marc Habermann
Lan Xu
138
2
0
15 Mar 2025
ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis
Yu Fang
Yue Yang
Xinghao Zhu
Kaiyuan Zheng
Gedas Bertasius
D. Szafir
Mingyu Ding
94
3
0
15 Mar 2025
Previous
1
2
3
...
5
6
7
...
12
13
14
Next