Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
v1
v2
v3
v4 (latest)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (8136★)
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 691 papers shown
Title
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Lvpan Cai
Haowei Wang
Jiayi Ji
YanShu ZhouMen
Yiwei Ma
Xiaoshuai Sun
Liujuan Cao
Rongrong Ji
ViT
90
1
0
16 Apr 2025
MediSee: Reasoning-based Pixel-level Perception in Medical Images
Qinyue Tong
Ziqian Lu
Jun Liu
Yangming Zheng
Zheming Lu
LRM
142
0
0
15 Apr 2025
IlluSign: Illustrating Sign Language Videos by Leveraging the Attention Mechanism
Janna Bruner
Amit Moryossef
Lior Wolf
DiffM
SLR
96
1
0
15 Apr 2025
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Hyunwoo Oh
Yezi Liu
Tamoghno Das
Mohsen Imani
VLM
LRM
84
0
0
15 Apr 2025
Weather-Aware Object Detection Transformer for Domain Adaptation
Soheil Gharatappeh
Salimeh Yasaei Sekeh
Vikas Dhiman
ViT
73
0
0
15 Apr 2025
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
Xuelong Li
Tao Zhang
Lu Qi
Ming-Hsuan Yang
104
1
0
15 Apr 2025
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization
Darryl Hannan
John Cooper
Dylan White
Timothy Doster
Henry Kvinge
Y. Watkins
72
0
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
233
132
1
14 Apr 2025
Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation
Yu Hao
Geeta Chandra Raju Bethala
Niraj Pudasaini
Hao Huang
Shuaihang Yuan
Congcong Wen
Baoru Huang
A. Nguyen
Yi Fang
LM&Ro
AI4CE
LRM
96
1
0
13 Apr 2025
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Haotian Xu
Yue Hu
Chen Gao
Zhengqiu Zhu
Yong Zhao
Yongqian Li
Quanjun Yin
139
2
0
13 Apr 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
Jing Zhang
...
Jiahui Lv
Ziqiang Liu
Tengyuan Shi
Qingjie Liu
Yansen Wang
MLLM
VLM
130
2
0
13 Apr 2025
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
Jialu Li
Shoubin Yu
Han Lin
Jaemin Cho
Jaehong Yoon
Joey Tianyi Zhou
DiffM
VGen
114
3
0
11 Apr 2025
Diffusion Models for Robotic Manipulation: A Survey
Rosa Wolf
Yitian Shi
Sheng Liu
Rania Rayyes
127
2
0
11 Apr 2025
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
Linyan Huang
Haonan Lin
Yanning Zhou
Kaiwen Xiao
110
1
0
10 Apr 2025
POEM: Precise Object-level Editing via MLLM control
Marco Schouten
Mehmet Onurcan Kaya
Serge Belongie
Dim P. Papadopoulos
DiffM
103
0
0
10 Apr 2025
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu
Qizhi Chen
Zechuan Wang
Yiwen Tang
Yiting Zhang
Chi Yan
Dong Wang
Xiaochen Li
Bin Zhao
CoGe
162
0
0
10 Apr 2025
Compass Control: Multi Object Orientation Control for Text-to-Image Generation
Rishubh Parihar
Vaibhav Agrawal
Sachidanand VS
R. V. Babu
DiffM
124
0
0
09 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Yansen Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
140
2
0
09 Apr 2025
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Rajhans Singh
Rafael Bidese Puhl
Kshitiz Dhakal
Sudhir Sornapudi
83
0
0
09 Apr 2025
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration
Omar Alama
A. Bhattacharya
Haoyang He
Seungchan Kim
Yuheng Qiu
Wenshan Wang
Cherie Ho
Nikhil Varma Keetha
Sebastian A. Scherer
69
1
0
09 Apr 2025
Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection
Ruoyu Chen
Hua Zhang
Jingzhi Li
Li Liu
Zhen Huang
Xiaochun Cao
86
1
0
09 Apr 2025
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie
Zhimin Ding
Kevin Yu
Marco Cheung
C. Jermaine
S. Chaudhuri
78
0
0
09 Apr 2025
MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos
Alexey Gavryushin
Xi Wang
Robert J. S. Malate
Chenyu Yang
Xiaojun Jia
Shubh Goel
Davide Liconti
René Zurbrugg
Robert K. Katzschmann
Marc Pollefeys
94
2
0
08 Apr 2025
Measuring Déjà vu Memorization Efficiently
Narine Kokhlikyan
Bargav Jayaraman
Florian Bordes
Chuan Guo
Kamalika Chaudhuri
69
1
0
08 Apr 2025
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang
Ahmet Iscen
Eunchan Jo
Sua Choi
Minsu Cho
Cordelia Schmid
VLM
KELM
OffRL
126
0
0
08 Apr 2025
On the Importance of Conditioning for Privacy-Preserving Data Augmentation
Julian Lorenz
K. Ludwig
Valentin Haug
Rainer Lienhart
DiffM
82
0
0
08 Apr 2025
Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images
Wenzhao Tang
Weihang Li
Xiucheng Liang
Olaf Wysocki
Filip Biljecki
Christoph Holst
Boris Jutzi
87
1
0
07 Apr 2025
Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection
Jiancheng Pan
Yanxing Liu
Xiao He
Long Peng
Jiahao Li
Yuze Sun
Xiaomeng Huang
80
2
0
06 Apr 2025
Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images
Hamza Riaz
Alan F. Smeaton
88
0
0
05 Apr 2025
Multi-identity Human Image Animation with Structural Video Diffusion
Zhenzhi Wang
Yongqian Li
Yanhong Zeng
Yuwei Guo
Dahua Lin
Tianfan Xue
Bo Dai
VGen
78
2
0
05 Apr 2025
Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model
Kotaro Ikeda
Masanori Koyama
Jinzhe Zhang
Kohei Hayashi
Kenji Fukumizu
OT
568
1
0
04 Apr 2025
Deep Reinforcement Learning via Object-Centric Attention
Johannes Czech
Cedric Derstroff
Bjarne Gregori
Elisabeth Dillies
Quentin Delfosse
Kristian Kersting
OCL
88
0
0
03 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
104
0
0
03 Apr 2025
MinkOcc: Towards real-time label-efficient semantic occupancy prediction
Samuel Sze
Daniele De Martini
Lars Kunze
3DPC
106
0
0
03 Apr 2025
Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments
Chenyu Zhang
Daniil Cherniavskii
Andrii Zadaianchuk
Antonios Tragoudaras
Antonios Vozikis
Thijmen Nijdam
Derck W. E. Prinzhorn
Mark Bodracska
N. Sebe
E. Gavves
EGVM
VGen
105
0
0
03 Apr 2025
BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation
Van Nguyen Nguyen
Stephen Tyree
Andrew Guo
Mederic Fourmy
Anas Gouda
...
Stan Birchfield
Jiri Matas
Yann Labbé
M. Sundermeyer
Tomás Hodan
3DPC
158
4
0
03 Apr 2025
Multi-party Collaborative Attention Control for Image Customization
Han Yang
Chuanguang Yang
Qiuli Wang
Zhulin An
Weilun Feng
Libo Huang
Yongjun Xu
DiffM
117
1
0
02 Apr 2025
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang
Duo Peng
Feng Chen
Yue Yang
Yinjie Lei
DiffM
149
0
0
02 Apr 2025
UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting
Jaehoon Choi
Dongki Jung
Yonghan Lee
Sungmin Eum
Dinesh Manocha
H. Kwon
3DGS
113
0
0
02 Apr 2025
Pro-DG: Procedural Diffusion Guidance for Architectural Facade Generation
Aleksander Plocharski
Jan Swidzinski
Przemyslaw Musialski
DiffM
52
0
0
02 Apr 2025
Multimodal Reference Visual Grounding
Yangxiao Lu
Ruosen Li
Liqiang Jing
Jikai Wang
Xinya Du
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
ObjD
122
0
0
02 Apr 2025
WorldScore: A Unified Evaluation Benchmark for World Generation
Haoyi Duan
Hong-Xing Yu
Sirui Chen
L. Fei-Fei
Jiajun Wu
VGen
150
8
0
01 Apr 2025
Global Intervention and Distillation for Federated Out-of-Distribution Generalization
Zhuang Qi
Runhui Zhang
Lei Meng
Wei Wu
Yachong Zhang
Xiangxu Meng
FedML
137
1
0
01 Apr 2025
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices
Bosung Kim
Kyuhwan Lee
Isu Jeong
Jungmin Cheon
Yeojin Lee
Seulki Lee
VGen
110
0
0
31 Mar 2025
CrowdVLM-R1: Expanding R1 Ability to Vision Language Model for Crowd Counting using Fuzzy Group Relative Policy Reward
Zhiqiang Wang
Pengbin Feng
Yanbin Lin
Shuzhang Cai
Zongao Bian
Jinghua Yan
Xingquan Zhu
92
4
0
31 Mar 2025
DenseFormer: Learning Dense Depth Map from Sparse Depth and Image via Conditional Diffusion Model
Ming Yuan
Sichao Wang
Chuang Zhang
Lei He
Qing Xu
Jianqiang Wang
DiffM
MDE
83
0
0
31 Mar 2025
Detecting Glioma, Meningioma, and Pituitary Tumors, and Normal Brain Tissues based on Yolov11 and Yolov8 Deep Learning Models
Ahmed M. Taha
Salah A. Aly
Mohamed F. Darwish
62
0
0
31 Mar 2025
Consistent Subject Generation via Contrastive Instantiated Concepts
Lee Hsin-Ying
Kelvin Chan
Ming-Hsuan Yang
DiffM
160
0
0
31 Mar 2025
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
Tianming Liang
Haichao Jiang
Wei-Shi Zheng
Jian-Fang Hu
113
0
0
30 Mar 2025
EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing
Hongxiang Jiang
Jihao Yin
Qixiong Wang
Jiaqi Feng
Guo Chen
103
1
0
30 Mar 2025
Previous
1
2
3
4
5
6
...
12
13
14
Next