Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,337 papers shown
Title
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
Zebin Yao
Lei Ren
Huixing Jiang
Chen Wei
Xiaojie Wang
Ruifan Li
Fangxiang Feng
DiffM
76
0
0
22 Apr 2025
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
Jinda Lu
Jinghan Li
Yuan Gao
Junkang Wu
Jiancan Wu
Xuben Wang
Xiangnan He
156
0
0
22 Apr 2025
DRAWER: Digital Reconstruction and Articulation With Environment Realism
Hongchi Xia
Entong Su
Marius Memmel
Arhan Jain
Raymond Yu
Numfor Mbiziwo-Tiapo
Ali Farhadi
Abhishek Gupta
Shenlong Wang
Wei-Chiu Ma
VGen
38
1
0
21 Apr 2025
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li
Jinglin Xu
Yunzhen Zhao
Yuxin Peng
ObjD
32
0
0
21 Apr 2025
Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation
Yunpu Zhao
Rui Zhang
Junbin Xiao
Ruibo Hou
Jiaming Guo
Zihao Zhang
Yifan Hao
Yunji Chen
38
0
0
21 Apr 2025
Insert Anything: Image Insertion via In-Context Editing in DiT
Wensong Song
Hong Jiang
Zongxing Yang
Ruijie Quan
Yi Yang
DiffM
45
0
0
21 Apr 2025
Emergence and Evolution of Interpretable Concepts in Diffusion Models
Berk Tinaz
Zalan Fabian
Mahdi Soltanolkotabi
DiffM
26
0
0
21 Apr 2025
ApexNav: An Adaptive Exploration Strategy for Zero-Shot Object Navigation with Target-centric Semantic Fusion
Mingjie Zhang
Yuheng Du
Chengkai Wu
Jinni Zhou
Zhenchao Qi
Jun Ma
Boyu Zhou
34
0
0
20 Apr 2025
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Weijun Zhuang
Qizhang Li
Xin Li
Ming-Yu Liu
Xiaopeng Hong
Feng Gao
Fan Yang
W. Zuo
35
0
0
20 Apr 2025
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Jiachen Li
Qing Xie
Xiaohan Yu
Hongyun Wang
Jinyu Xu
Yongjian Liu
ObjD
81
0
0
20 Apr 2025
SG-Reg: Generalizable and Efficient Scene Graph Registration
Chuhao Liu
Zhijian Qiao
Jieqi Shi
Ke Wang
Peize Liu
Shaojie Shen
31
0
0
20 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
46
0
0
19 Apr 2025
ESPLoRA: Enhanced Spatial Precision with Low-Rank Adaption in Text-to-Image Diffusion Models for High-Definition Synthesis
Andrea Rigo
Luca Stornaiuolo
Mauro Martino
Bruno Lepri
N. Sebe
50
0
0
18 Apr 2025
Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation
SoYoung Park
Hyewon Lee
M. Choi
Seunghoon Han
Jong-Ryul Lee
Sungsu Lim
Tae-Ho Kim
VLM
57
0
0
18 Apr 2025
BeetleVerse: A study on taxonomic classification of ground beetles
S M Rayeed
Alyson East
Samuel Stevens
Sydne Record
Charles V. Stewart
28
0
0
18 Apr 2025
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun
Junbin Xiao
Tze Ho Elden Tse
Yicong Li
Arjun Akula
Angela Yao
EgoV
52
0
0
18 Apr 2025
Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes
Sridevi Polavaram
Xin Zhou
Meenu Ravi
Mohammad Zarei
Anmol Srivastava
21
0
0
18 Apr 2025
Weak Cube R-CNN: Weakly Supervised 3D Detection using only 2D Bounding Boxes
Andreas Lau Hansen
Lukas Wanzeck
Dim P. Papadopoulos
31
0
0
17 Apr 2025
TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors
Mingwei Li
Pu Pang
Hehe Fan
Hua Huang
Yi Yang
3DGS
34
0
0
17 Apr 2025
ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation
Hongyu Li
James Akl
Srinath Sridhar
Tye Brady
Taskin Padir
44
0
0
17 Apr 2025
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Tyler Ga Wei Lum
Olivia Y. Lee
C. Karen Liu
Jeannette Bohg
45
1
0
17 Apr 2025
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
Qianqian Sun
Jixiang Luo
Dell Zhang
Xuelong Li
DiffM
54
0
0
17 Apr 2025
Post-Hurricane Debris Segmentation Using Fine-Tuned Foundational Vision Models
Kooshan Amini
Yuhao Liu
Jamie Ellen Padgett
Guha Balakrishnan
Ashok Veeraraghavan
33
0
0
17 Apr 2025
Image-Editing Specialists: An RLAIF Approach for Diffusion Models
Elior Benarous
Yilun Du
Heng Yang
22
0
0
17 Apr 2025
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
Mengshi Qi
Pengfei Zhu
Xianrui Li
Xiaoyang Bi
Lu Qi
Huadong Ma
Ming Yang
VOS
VLM
51
0
0
16 Apr 2025
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Lvpan Cai
Haowei Wang
Jiayi Ji
YanShu ZhouMen
Yiwei Ma
Xiaoshuai Sun
Liujuan Cao
Rongrong Ji
ViT
39
0
0
16 Apr 2025
Learning What NOT to Count
Adriano DÁlessandro
Ali Mahdavi-Amiri
Ghassan Hamarneh
32
0
0
16 Apr 2025
Weather-Aware Object Detection Transformer for Domain Adaptation
Soheil Gharatappeh
Salimeh Yasaei Sekeh
Vikas Dhiman
ViT
31
0
0
15 Apr 2025
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
Xuelong Li
Tao Zhang
Lu Qi
Ming Yang
33
0
0
15 Apr 2025
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Hyunwoo Oh
Yezi Liu
Tamoghno Das
Mohsen Imani
VLM
LRM
39
0
0
15 Apr 2025
IlluSign: Illustrating Sign Language Videos by Leveraging the Attention Mechanism
Janna Bruner
Amit Moryossef
Lior Wolf
DiffM
SLR
50
0
0
15 Apr 2025
MediSee: Reasoning-based Pixel-level Perception in Medical Images
Qinyue Tong
Ziqian Lu
Jun Liu
Yangming Zheng
Zheming Lu
LRM
43
0
0
15 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
70
15
1
14 Apr 2025
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization
Darryl Hannan
John Cooper
Dylan White
Timothy Doster
Henry Kvinge
Y. Watkins
29
0
0
14 Apr 2025
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Haotian Xu
Yue Hu
Chen Gao
Zhengqiu Zhu
Yong Zhao
Yong Li
Quanjun Yin
39
0
0
13 Apr 2025
Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation
Yu Hao
Geeta Chandra Raju Bethala
Niraj Pudasaini
Hao Huang
Shuaihang Yuan
Congcong Wen
Baoru Huang
A. Nguyen
Yi Fang
LM&Ro
AI4CE
LRM
64
0
0
13 Apr 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
Jun Zhang
...
Jiahui Lv
Ziqiang Liu
Tengyuan Shi
Qingjie Liu
Yansen Wang
MLLM
VLM
63
1
0
13 Apr 2025
Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization
Lin Zhu
Yifeng Yang
Zichao Nie
Yuan Gao
VLM
30
0
0
13 Apr 2025
Diffusion Models for Robotic Manipulation: A Survey
Rosa Wolf
Yitian Shi
Sheng Liu
Rania Rayyes
51
1
0
11 Apr 2025
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
Jialu Li
Shoubin Yu
Han Lin
Jaemin Cho
Jaehong Yoon
Joey Tianyi Zhou
DiffM
VGen
55
0
0
11 Apr 2025
POEM: Precise Object-level Editing via MLLM control
Marco Schouten
Mehmet Onurcan Kaya
Serge Belongie
Dim P. Papadopoulos
DiffM
82
0
0
10 Apr 2025
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Haozhan Shen
Peng Liu
Jiashi Li
Chunxin Fang
Yibo Ma
...
Zilun Zhang
Kangjia Zhao
Qianqian Zhang
Ruochen Xu
Tiancheng Zhao
VLM
LRM
76
0
0
10 Apr 2025
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu
Qizhi Chen
Z. Wang
Yiwen Tang
Yiting Zhang
Chi Yan
Dong Wang
X. Li
Bin Zhao
CoGe
49
0
0
10 Apr 2025
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
Linyan Huang
Haonan Lin
Yanning Zhou
Kaiwen Xiao
47
0
0
10 Apr 2025
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration
Omar Alama
A. Bhattacharya
Haoyang He
Seungchan Kim
Yuheng Qiu
Wenshan Wang
Cherie Ho
Nikhil Varma Keetha
Sebastian A. Scherer
31
0
0
09 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Yansen Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
72
0
0
09 Apr 2025
Compass Control: Multi Object Orientation Control for Text-to-Image Generation
Rishubh Parihar
Vaibhav Agrawal
Sachidanand VS
R. V. Babu
DiffM
36
0
0
09 Apr 2025
Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection
Ruoyu Chen
Hua Zhang
Jingzhi Li
Li Liu
Zhen Huang
Xiaochun Cao
37
0
0
09 Apr 2025
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Rajhans Singh
Rafael Bidese Puhl
Kshitiz Dhakal
Sudhir Sornapudi
31
0
0
09 Apr 2025
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie
Zhimin Ding
Kevin Yu
Marco Cheung
C. Jermaine
S. Chaudhuri
30
0
0
09 Apr 2025
Previous
1
2
3
4
5
...
25
26
27
Next