Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,338 papers shown
Title
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Yujie Zhang
MLLM
VLM
47
1
0
01 Mar 2025
Solving Instance Detection from an Open-World Perspective
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
43
0
0
01 Mar 2025
Technical Report for ReID-SAM on SkiTB Visual Tracking Challenge 2025
Kunjun Li
Cheng-Yen Yang
Hsiang-Wei Huang
Lei Li
54
0
0
28 Feb 2025
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
ObjD
VLM
52
0
0
28 Feb 2025
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian
Zhongliang Guo
Bowen Deng
Chun Tong Lei
Shuai Zhao
Chun Pong Lau
Xiaopeng Hong
Michael P. Pound
DiffM
59
0
0
28 Feb 2025
LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding
Ang Cao
Sergio Arnaud
Oleksandr Maksymets
Jianing Yang
Ayush Jain
...
Aravind Rajeswaran
Franziska Meier
Justin Johnson
Jeong Joon Park
Alexander Sax
70
0
0
27 Feb 2025
C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
Yuhao Li
Mirana Claire Angel
Salman Khan
Yu Zhu
Jinqiu Sun
Yanning Zhang
Fahad Shahbaz Khan
VGen
51
0
0
27 Feb 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
Jun Liu
Peng Wang
Guoqing Wang
Yuqing Yang
H. Shen
ObjD
94
0
0
27 Feb 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Zhenyu Liu
Yunxin Li
Baotian Hu
Wenhan Luo
Yaowei Wang
Min-Ling Zhang
65
0
0
27 Feb 2025
A Survey on Foundation-Model-Based Industrial Defect Detection
Tianle Yang
Luyao Chang
Jiadong Yan
Jiyang Li
Zhi Wang
Ke Zhang
AI4CE
90
2
0
26 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
71
0
0
26 Feb 2025
FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks
Tanawan Premsri
Parisa Kordjamshidi
43
1
0
25 Feb 2025
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation
Young Beom Woo
Sun Eung Kim
DiffM
48
0
0
24 Feb 2025
Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models
Yaqi Sun
Kyohei Atarashi
Koh Takeuchi
Hisashi Kashima
MLLM
53
0
0
24 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
76
0
0
24 Feb 2025
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
Yuheng Xue
Nenglun Chen
Jun Liu
Wenyun Sun
3DPC
73
7
0
24 Feb 2025
SLABIM: A SLAM-BIM Coupled Dataset in HKUST Main Building
Haoming Huang
Zhijian Qiao
Zehuan Yu
Chuhao Liu
Shaojie Shen
Fumin Zhang
Huan Yin
41
0
0
24 Feb 2025
Anatomical grounding pre-training for medical phrase grounding
Wenjun Zhang
Shakes Chandra
Aaron Nicolson
MedIm
30
0
0
23 Feb 2025
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
Caixiong Li
Xiongwei Zhao
Jinhang Zhang
Xing Zhang
Qihao Sun
Zhou Wu
ObjD
MLLM
VLM
56
0
0
23 Feb 2025
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
D. She
Mushui Liu
Jingxuan Pang
Jin Wang
Zhen Yang
...
Yi Wang
Qihan Huang
Haobin Tang
YunLong Yu
Siming Fu
VGen
96
4
0
21 Feb 2025
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization
Zheyuan Zhang
Runze Li
Tasnim Kabir
Jordan Boyd-Graber
55
0
0
21 Feb 2025
VaLID: Verification as Late Integration of Detections for LiDAR-Camera Fusion
Vanshika Vats
Marzia Binta Nizam
James Davis
3DPC
69
0
0
21 Feb 2025
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
Luzhou Ge
Xiangyu Zhu
Zhuo Yang
Xuesong Li
3DGS
70
0
0
21 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
128
2
0
21 Feb 2025
SegSub: Evaluating Robustness to Knowledge Conflicts and Hallucinations in Vision-Language Models
Peter Carragher
Nikitha Rao
Abhinand Jha
R Raghav
Kathleen M. Carley
VLM
56
0
0
19 Feb 2025
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Kaixin Yao
Longwen Zhang
Xinhao Yan
Yan Zeng
Qixuan Zhang
Wei Yang
Lan Xu
Jiayuan Gu
Jingyi Yu
29
3
0
18 Feb 2025
A Survey of Text Classification Under Class Distribution Shift
Adriana Valentina Costache
Silviu Florin Gheorghe
Eduard Poesina
Paul Irofti
Radu Tudor Ionescu
OOD
VLM
60
0
0
18 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
131
4
0
18 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
122
9
0
18 Feb 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
Xinlong Chen
Yang Zhang
Chongling Rao
Yushuo Guan
Qingbin Liu
Fuzheng Zhang
Chengru Song
Qiang Liu
Di Zhang
Tieniu Tan
17
0
0
18 Feb 2025
FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion
Yufan Zhou
Haoyu Shen
Huan Wang
DiffM
113
0
0
17 Feb 2025
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Yujie Lin
Ante Wang
Moye Chen
Jingyao Liu
Hao Liu
Jinsong Su
Xinyan Xiao
LRM
50
2
0
17 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
Xinsong Zhang
Jingyu Wang
Wenbing Tao
64
4
0
17 Feb 2025
Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos
Weirui Ye
Fangchen Liu
Z. Ding
Yang Gao
Oleh Rybkin
Pieter Abbeel
VGen
OffRL
88
3
0
14 Feb 2025
HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation
Yibo Liu
Zhaodong Jiang
Binbin Xu
Guile Wu
Y. Ren
Tongtong Cao
Bingbing Liu
Rui Heng Yang
Amir Rasouli
J. Shan
52
1
0
14 Feb 2025
Imit Diff: Semantics Guided Diffusion Transformer with Dual Resolution Fusion for Imitation Learning
Yuhang Dong
Haizhou Ge
Yupei Zeng
Jingyang Zhang
Beiwen Tian
...
Yufei Jia
Ruixiang Wang
Ran Yi
Guyue Zhou
Longhua Ma
56
0
0
11 Feb 2025
Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior
Lee Hyoseok
Kyeong Seon Kim
Kwon Byung-Ki
Tae-Hyun Oh
MDE
185
0
0
10 Feb 2025
Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
3DGS
3DPC
AI4CE
63
1
0
09 Feb 2025
LeAP: Consistent multi-domain 3D labeling using Foundation Models
Simon Gebraad
Andras Palffy
Holger Caesar
155
1
0
06 Feb 2025
Tell2Reg: Establishing spatial correspondence between images by the same language prompts
Wen Yan
Qianye Yang
Shiqi Huang
Yipei Wang
S. Punwani
M. Emberton
V. Stavrinides
Yipeng Hu
D. Barratt
92
0
0
05 Feb 2025
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting
Keyi Zhu
Jiajia Li
Kaixiang Zhang
Chaaran Arunachalam
Siddhartha Bhattacharya
R. Lu
Zhaojian Li
87
0
0
03 Feb 2025
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
Joshua R. Waite
Md Zahid Hasan
Qisai Liu
Zhanhong Jiang
Chinmay Hegde
S. Sarkar
OffRL
SyDa
184
1
0
31 Jan 2025
Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
A. H. Tan
Angus Fung
Haitong Wang
G. Nejat
99
2
0
31 Jan 2025
A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches
Luca Ciampi
Ali Azmoudeh
Elif Ecem Akbaba
Erdi Sarıtaş
Ziya Ata Yazıcı
H. K. Ekenel
Giuseppe Amato
Fabrizio Falchi
102
0
0
31 Jan 2025
VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback
Sayeh Gholipour Picha
D. Chanti
A. Caplier
MedIm
58
0
0
29 Jan 2025
Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation
Reza Akbarian Bafghi
Carden Bagwell
Avinash Ravichandran
Ashish Shrivastava
M. Raissi
50
0
0
28 Jan 2025
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
Mai A. Shaaban
Adnan Khan
Mohammad Yaqub
LM&MA
86
2
0
28 Jan 2025
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control
Aosong Feng
Weikang Qiu
Jinbin Bai
Xiao Zhang
Zhen Dong
Kaicheng Zhou
Rex Ying
Leandros Tassiulas
DiffM
63
6
0
28 Jan 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
94
0
0
25 Jan 2025
PAID: A Framework of Product-Centric Advertising Image Design
Hongyu Chen
Min Zhou
Jing Jiang
Jiale Chen
Yang Lu
Bo Xiao
T. Ge
Bo Zheng
DiffM
VLM
43
0
0
24 Jan 2025
Previous
1
2
3
...
5
6
7
...
25
26
27
Next