Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05499
Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
50 / 1,339 papers shown
Title
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Chaoyang Zhu
Long Chen
ObjD
VLM
31
33
0
18 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
44
108
0
17 Jul 2023
Sim2Plan: Robot Motion Planning via Message Passing between Simulation and Reality
Yizhou Zhao
Yuanhong Zeng
Qiang Long
Ying Nian Wu
Song-Chun Zhu
35
0
0
15 Jul 2023
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments
R. Liu
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ke Cao
Yufan Chen
Kailun Yang
Rainer Stiefelhagen
27
15
0
15 Jul 2023
OG: Equip vision occupancy with instance segmentation and visual grounding
Zichao Dong
Hang Ji
Weikun Zhang
Xufeng Huang
Junbo Chen
ISeg
VLM
34
0
0
12 Jul 2023
AutoDecoding Latent 3D Diffusion Models
Evangelos Ntavelis
Aliaksandr Siarohin
Kyle Olszewski
Chao-Yuan Wang
Luc Van Gool
Sergey Tulyakov
DiffM
41
43
0
07 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
85
225
0
07 Jul 2023
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation
Yonglin Li
Jing Zhang
Xiao Teng
Long Lan
VOS
VLM
23
18
0
03 Jul 2023
Hierarchical Open-vocabulary Universal Image Segmentation
Xudong Wang
Shufang Li
Konstantinos Kallidromitis
Yu Kato
Kazuki Kozuka
Trevor Darrell
VLM
OCL
51
37
0
03 Jul 2023
DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang
Linjie Li
Kevin Qinghong Lin
Yuanhao Zhai
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
VGen
32
74
0
30 Jun 2023
Counting Guidance for High Fidelity Text-to-Image Synthesis
Wonjune Kang
Kevin Galim
H. Koo
Nam Ik Cho
DiffM
36
8
0
30 Jun 2023
The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot
L. Osco
Qiusheng Wu
Eduardo Lopes de Lemos
W. Gonçalves
A. P. Ramos
Jonathan Li
J. M. Junior
VLM
16
181
0
29 Jun 2023
KITE: Keypoint-Conditioned Policies for Semantic Manipulation
Priya Sundaresan
Suneel Belkhale
Dorsa Sadigh
Jeannette Bohg
LM&Ro
33
24
0
29 Jun 2023
RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model
Keyan Chen
Chenyang Liu
Hao Chen
Haotian Zhang
Wenyuan Li
Zhengxia Zou
Z. Shi
VLM
21
203
0
28 Jun 2023
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjD
VLM
45
136
0
28 Jun 2023
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
Benedikt Blumenstiel
Johannes Jakubik
Hilde Kuhne
Michael Vossing
VLM
32
15
0
27 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
44
598
0
27 Jun 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
Chong Chen
Yaochu Jin
Lingjuan Lyu
AIFin
AI4CE
99
86
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
50
702
0
26 Jun 2023
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications
Chaoning Zhang
Dongshen Han
Yu Qiao
Jung Uk Kim
Sung-Ho Bae
Seungkyu Lee
Choong Seon Hong
VLM
41
329
0
25 Jun 2023
DesCo: Learning Object Recognition with Rich Language Descriptions
Liunian Harold Li
Zi-Yi Dou
Nanyun Peng
Kai-Wei Chang
ObjD
VLM
34
20
0
24 Jun 2023
Robustness of Segment Anything Model (SAM) for Autonomous Driving in Adverse Weather Conditions
Xinru Shan
Chaoning Zhang
VLM
33
12
0
23 Jun 2023
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation
Xiwen Liang
Liang Ma
Shanshan Guo
Jianhua Han
Hang Xu
Shikui Ma
Xiaodan Liang
LM&Ro
LLMAG
90
4
0
17 Jun 2023
Seeing the World through Your Eyes
Hadi Alzayer
Kevin Zhang
Brandon Yushan Feng
Christopher A. Metzler
Jia-Bin Huang
CVBM
30
16
0
15 Jun 2023
Robustness Analysis on Foundational Segmentation Models
Madeline Chantry Schiappa
Shehreen Azad
V. Sachidanand
Yunhao Ge
O. Mikšík
Yogesh S Rawat
Vibhav Vineet
OOD
VLM
AAML
30
6
0
15 Jun 2023
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection
Yunkang Cao
Xiaohao Xu
Chen Sun
Y. Cheng
Liang Gao
Weiming Shen
44
1
0
15 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
41
7
0
14 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
35
72
0
14 Jun 2023
Robustness of SAM: Segment Anything Under Corruptions and Beyond
Yu Qiao
Chaoning Zhang
Taegoo Kang
Donghun Kim
Chenshuang Zhang
Choong Seon Hong
AAML
25
33
0
13 Jun 2023
detrex: Benchmarking Detection Transformers
Tianhe Ren
Siyi Liu
Feng Li
Hao Zhang
Ailing Zeng
...
Zhaoyang Zeng
Xianbiao Qi
Yuhui Yuan
Jianwei Yang
Lei Zhang
40
13
0
12 Jun 2023
Transferring Foundation Models for Generalizable Robotic Manipulation
Jiange Yang
Wenhui Tan
Chuhao Jin
Keling Yao
Bei Liu
Jianlong Fu
Ruihua Song
Gangshan Wu
Limin Wang
LM&Ro
57
6
0
09 Jun 2023
Matting Anything
Jiacheng Li
Jitesh Jain
Humphrey Shi
VLM
36
16
0
08 Jun 2023
Modular Visual Question Answering via Code Generation
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
29
47
0
08 Jun 2023
Fine-Grained Visual Prompting
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjD
VLM
37
61
0
07 Jun 2023
Matte Anything: Interactive Natural Image Matting with Segment Anything Models
J. Yao
Xinggang Wang
Lang Ye
Wenyu Liu
28
38
0
07 Jun 2023
Recognize Anything: A Strong Image Tagging Model
Youcai Zhang
Xinyu Huang
Jinyu Ma
Zhaoyang Li
Zhaochuan Luo
...
Tong Luo
Yaqian Li
Siyi Liu
Yandong Guo
Lei Zhang
VLM
35
225
0
06 Jun 2023
Zero-Shot 3D Shape Correspondence
Ahmed Abdelreheem
Abdelrahman Eldesokey
M. Ovsjanikov
Peter Wonka
33
24
0
05 Jun 2023
LRVS-Fashion: Extending Visual Search with Referring Instructions
Simon Lepage
Jérémie Mary
David Picard
25
1
0
05 Jun 2023
Understanding Segment Anything Model: SAM is Biased Towards Texture Rather than Shape
Chaoning Zhang
Yu Qiao
Shehbaz Tariq
Sheng Zheng
Chenshuang Zhang
Chenghao Li
Hyundong Shin
Choong Seon Hong
VLM
42
10
0
03 Jun 2023
Segment Anything in High Quality
Lei Ke
Mingqiao Ye
Martin Danelljan
Yifan Liu
Yu-Wing Tai
Chi-Keung Tang
Feng Yu
VLM
37
311
0
02 Jun 2023
LOWA: Localize Objects in the Wild with Attributes
Xiaoyuan Guo
Kezhen Chen
Jinmeng Rao
Yawen Zhang
Baochen Sun
Jie Yang
ObjD
46
2
0
31 May 2023
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment
Guian Fang
Zutao Jiang
Jianhua Han
Guangsong Lu
Hang Xu
Shengcai Liao
Xiaodan Liang
EGVM
29
1
0
31 May 2023
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
38
30
0
30 May 2023
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Rui Yang
Lin Song
Yanwei Li
Sijie Zhao
Yixiao Ge
Xiu Li
Ying Shan
SyDa
MLLM
36
209
0
30 May 2023
Contextual Object Detection with Multimodal Large Language Models
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
41
78
0
29 May 2023
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
Fu Lee Wang
Wenshuo Chen
Guanglu Song
Han-Jia Ye
Yu Liu
Hongsheng Li
VGen
DiffM
50
89
0
29 May 2023
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions
Qian Wang
Biao Zhang
Michael Birsak
Peter Wonka
DiffM
30
31
0
29 May 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
36
5
0
28 May 2023
Text-to-image Editing by Image Information Removal
Zhongping Zhang
Jian Zheng
Jacob Zhiyuan Fang
Bryan A. Plummer
DiffM
31
12
0
27 May 2023
Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models
Yunhao Ge
Jie Jessie Ren
Jiaping Zhao
Kaifeng Chen
Andrew Gallagher
Laurent Itti
Balaji Lakshminarayanan
VLM
ObjD
26
1
0
26 May 2023
Previous
1
2
3
...
25
26
27
Next