Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
v1
v2
v3
v4 (latest)
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (8136★)
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 690 papers shown
Title
Robotic Visual Instruction
Yuchen Li
Ziyang Gong
Haoyang Li
Xiaoqi Huang
Haolan Kang
Guangping Bai
Xianzheng Ma
LM&Ro
175
2
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
Jieneng Chen
LRM
144
1
0
01 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
Haoyang Li
LRM
205
24
0
01 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Ziyi Wang
Tao Jin
DiffM
313
2
0
30 Apr 2025
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
Pranav Saxena
Nishant Raghuvanshi
Neena Goveas
139
0
0
30 Apr 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Yue Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Yifan Yang
Qi Zhang
Lili Qiu
VLM
120
1
0
30 Apr 2025
Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection
Daniel Bogdoll
Rajanikant Ananta
Abeyankar Giridharan
Isabel Moore
Gregory Stevens
Henry X. Liu
VLM
109
0
0
30 Apr 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
80
0
0
30 Apr 2025
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
Yongqian Li
Lu Si
Y. T. Hou
Chengaung Liu
Yangqiu Song
Hongjian Fang
Jing Zhang
134
0
0
30 Apr 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
169
0
0
30 Apr 2025
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
Linshan Wu
Yuxiang Nie
Sunan He
Jiaxin Zhuang
Hao Chen
...
V. Vardhanabhuti
R. Chan
Yifan Peng
Pranav Rajpurkar
Hao Chen
LM&MA
MedIm
201
0
0
30 Apr 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
107
0
0
30 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
Xuzhao Li
MLLM
256
0
0
29 Apr 2025
Explaining Vision GNNs: A Semantic and Visual Analysis of Graph-based Image Classification
Nikolaos Chaidos
Angeliki Dimitriou
Nikolaos Spanos
Athanasios Voulodimos
Giorgos Stamou
77
1
0
28 Apr 2025
Simultaneous Pick and Place Detection by Combining SE(3) Diffusion Models with Differential Kinematics
Tianyi Ko
Takuya Ikeda
Koichi Nishiwaki
79
0
0
28 Apr 2025
If Concept Bottlenecks are the Question, are Foundation Models the Answer?
Nicola Debole
Pietro Barbiero
Francesco Giannini
Andrea Passerini
Stefano Teso
Emanuele Marconato
527
1
0
28 Apr 2025
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Zishen Wan
Jiayi Qian
Yuhang Du
Jason J. Jabbour
Yilun Du
Yang Katie Zhao
A. Raychowdhury
Tushar Krishna
Vijay Janapa Reddi
LM&Ro
191
1
0
26 Apr 2025
TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians
Letian Huang
Dongwei Ye
Jialin Dan
Chengzhi Tao
Huiwen Liu
Kun Zhou
Bo Ren
You Li
Yanwen Guo
Jie Guo
135
1
0
26 Apr 2025
SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting
Yiming Zhao
Guorong Li
Laiyun Qing
Amin Beheshti
Jian Yang
Michael Sheng
Yuankai Qi
Qingming Huang
VLM
VPVLM
116
0
0
24 Apr 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas Guibas
Minhyuk Sung
LRM
113
3
0
24 Apr 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Yonatan Bitton
Brian Gordon
Michal Sokolik
Nitzan Bitton-Guetta
Almog Gueta
Royi Rassin
Itay Laish
Dani Lischinski
EGVM
VGen
114
0
0
24 Apr 2025
VideoVista-CulturalLingo: 360
∘
^\circ
∘
Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
Xinyu Chen
Yunxin Li
Haoyuan Shi
Baotian Hu
Wenhan Luo
Yaowei Wang
Hao Fei
ELM
123
0
0
23 Apr 2025
MorphoNavi: Aerial-Ground Robot Navigation with Object Oriented Mapping in Digital Twin
Sausar Karaf
Mikhail Martynov
Oleg Sautenkov
Zhanibek Darush
Dzmitry Tsetserukou
76
1
0
23 Apr 2025
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
Zebin Yao
Lujie Niu
Huixing Jiang
Chen Wei
Fangkun Zhao
Ruifan Li
Fangxiang Feng
DiffM
192
0
0
22 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&Ro
LRM
104
0
0
22 Apr 2025
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
Jinda Lu
Jinghan Li
Yuan Gao
Junkang Wu
Jiancan Wu
Xiang Wang
Xiangnan He
429
1
0
22 Apr 2025
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li
Jinglin Xu
Yunzhen Zhao
Yuxin Peng
ObjD
96
3
0
21 Apr 2025
DRAWER: Digital Reconstruction and Articulation With Environment Realism
Hongchi Xia
Entong Su
Marius Memmel
Arhan Jain
Raymond Yu
Numfor Mbiziwo-Tiapo
Ali Farhadi
Abhishek Gupta
Shenlong Wang
Wei-Chiu Ma
VGen
122
1
0
21 Apr 2025
Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation
Yunpu Zhao
Rui Zhang
Junbin Xiao
Ruibo Hou
Jiaming Guo
Zihao Zhang
Yifan Hao
Yunji Chen
81
1
0
21 Apr 2025
Insert Anything: Image Insertion via In-Context Editing in DiT
Wensong Song
Hong Jiang
Zongxing Yang
Ruijie Quan
Yi Yang
DiffM
126
4
0
21 Apr 2025
Emergence and Evolution of Interpretable Concepts in Diffusion Models
Berk Tinaz
Zalan Fabian
Mahdi Soltanolkotabi
DiffM
62
0
0
21 Apr 2025
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Jiachen Li
Qing Xie
Xiaohan Yu
Hongyun Wang
Jinyu Xu
Yongjian Liu
ObjD
159
0
0
20 Apr 2025
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Weijun Zhuang
Qizhang Li
Xin Li
Ming-Yu Liu
Xiaopeng Hong
Feng Gao
Fan Yang
W. Zuo
88
0
0
20 Apr 2025
ApexNav: An Adaptive Exploration Strategy for Zero-Shot Object Navigation with Target-centric Semantic Fusion
Mingjie Zhang
Yuheng Du
Chengkai Wu
Jinni Zhou
Zhenchao Qi
Jun Ma
Boyu Zhou
220
0
0
20 Apr 2025
SG-Reg: Generalizable and Efficient Scene Graph Registration
Chuhao Liu
Zhijian Qiao
Jieqi Shi
Ke Wang
Peize Liu
Shaojie Shen
132
0
0
20 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
83
2
0
19 Apr 2025
BeetleVerse: A study on taxonomic classification of ground beetles
S M Rayeed
Alyson East
Samuel Stevens
Sydne Record
Charles V. Stewart
53
0
0
18 Apr 2025
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun
Junbin Xiao
Tze Ho Elden Tse
Yicong Li
Arjun Akula
Angela Yao
EgoV
89
0
0
18 Apr 2025
Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation
SoYoung Park
Hyewon Lee
M. Choi
Seunghoon Han
Jong-Ryul Lee
Sungsu Lim
Tae-Ho Kim
VLM
102
0
0
18 Apr 2025
Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes
Sridevi Polavaram
Xin Zhou
Meenu Ravi
Mohammad Zarei
Anmol Srivastava
45
0
0
18 Apr 2025
ESPLoRA: Enhanced Spatial Precision with Low-Rank Adaption in Text-to-Image Diffusion Models for High-Definition Synthesis
Andrea Rigo
Luca Stornaiuolo
Mauro Martino
Bruno Lepri
N. Sebe
94
0
0
18 Apr 2025
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
Qianqian Sun
Jixiang Luo
Dell Zhang
Xuelong Li
DiffM
82
0
0
17 Apr 2025
TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors
Mingwei Li
Pu Pang
Hehe Fan
Hua Huang
Yi Yang
3DGS
68
0
0
17 Apr 2025
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Tyler Ga Wei Lum
Olivia Y. Lee
C. Karen Liu
Jeannette Bohg
126
1
0
17 Apr 2025
Post-Hurricane Debris Segmentation Using Fine-Tuned Foundational Vision Models
Kooshan Amini
Yuhao Liu
Jamie Ellen Padgett
Guha Balakrishnan
Ashok Veeraraghavan
84
0
0
17 Apr 2025
Weak Cube R-CNN: Weakly Supervised 3D Detection using only 2D Bounding Boxes
Andreas Lau Hansen
Lukas Wanzeck
Dim P. Papadopoulos
65
0
0
17 Apr 2025
ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation
Hongyu Li
James Akl
Srinath Sridhar
Tye Brady
Taskin Padir
194
1
0
17 Apr 2025
Image-Editing Specialists: An RLAIF Approach for Diffusion Models
Elior Benarous
Yilun Du
Heng Yang
65
0
0
17 Apr 2025
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Lvpan Cai
Haowei Wang
Jiayi Ji
YanShu ZhouMen
Yiwei Ma
Xiaoshuai Sun
Liujuan Cao
Rongrong Ji
ViT
90
1
0
16 Apr 2025
Learning What NOT to Count
Adriano DÁlessandro
Ali Mahdavi-Amiri
Ghassan Hamarneh
87
0
0
16 Apr 2025
Previous
1
2
3
4
5
...
12
13
14
Next