ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
v1v2v3v4 (latest)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXiv (abs)PDFHTMLGithub (8136★)

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 690 papers shown
Title
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab
M. Maruf
Arka Daw
Harish Babu Manogaran
Abhilash Neog
...
Paula Mabee
Wasila Dahdul
Anuj Karpatne
Wasila M Dahdul
Anuj Karpatne
233
4
0
10 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and
  Editing
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLMDiffM
132
40
0
08 Jul 2024
ClutterGen: A Cluttered Scene Generator for Robot Learning
ClutterGen: A Cluttered Scene Generator for Robot Learning
Yinsen Jia
Boyuan Chen
114
4
0
07 Jul 2024
Rethinking Visual Prompting for Multimodal Large Language Models with
  External Knowledge
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip Torr
Lu Yuan
LRMVLM
81
8
0
05 Jul 2024
CountGD: Multi-Modal Open-World Counting
CountGD: Multi-Modal Open-World Counting
Niki Amini-Naieni
Tengda Han
Andrew Zisserman
ObjD
161
13
0
05 Jul 2024
Open Scene Graphs for Open World Object-Goal Navigation
Open Scene Graphs for Open World Object-Goal Navigation
Joel Loo
Zhanxin Wu
David Hsu
LM&Ro
94
5
0
02 Jul 2024
OpenSlot: Mixed Open-Set Recognition with Object-Centric Learning
OpenSlot: Mixed Open-Set Recognition with Object-Centric Learning
Xu Yin
Fei Pan
G. An
Yuchi Huo
Zixuan Xie
Sung-eui Yoon
BDLVLM
156
1
0
02 Jul 2024
ViG-Bias: Visually Grounded Bias Discovery and Mitigation
ViG-Bias: Visually Grounded Bias Discovery and Mitigation
Badr-Eddine Marani
Mohamed Hanini
Nihitha Malayarukil
Stergios Christodoulidis
Maria Vakalopoulou
Enzo Ferrante
62
0
0
02 Jul 2024
MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
Dewei Zhou
Yuchen Li
Fan Ma
Zongxin Yang
Yue Yang
175
11
0
02 Jul 2024
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Tianqi Xu
Linyao Chen
Dai-Jie Wu
Yanjun Chen
Zecheng Zhang
...
Zhaoxuan Jin
Ge Li
Philip Torr
Bernard Ghanem
Guohao Li
161
21
0
01 Jul 2024
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models
  via Counterfactual Probing
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao
Aishan Liu
QianJia Cheng
Zhenfei Yin
Siyuan Liang
Jiapeng Li
Jing Shao
Xianglong Liu
Dacheng Tao
129
8
0
30 Jun 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
196
31
0
28 Jun 2024
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Yicheng Chen
Xiangtai Li
Yining Li
Yanhong Zeng
Jianzong Wu
Xiangyu Zhao
Kai Chen
VLMDiffM
164
3
0
28 Jun 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Xiangyu Zhao
Xiangtai Li
Haodong Duan
Haian Huang
Yining Li
Kai Chen
Hua Yang
VLMMLLM
120
12
0
25 Jun 2024
High-resolution open-vocabulary object 6D pose estimation
High-resolution open-vocabulary object 6D pose estimation
Jaime Corsetti
Davide Boscaini
Francesco Giuliari
Changjae Oh
Andrea Cavallaro
Fabio Poiesi
81
2
0
24 Jun 2024
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Jie Ren
Kangrui Chen
Yingqian Cui
Shenglai Zeng
Hui Liu
Yue Xing
Jiliang Tang
Lingjuan Lyu
118
2
0
21 Jun 2024
Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping
Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping
Mingxi Jia
Haojie Huang
Zhewen Zhang
Chenghao Wang
Linfeng Zhao
Dian Wang
J. Liu
Robin Walters
Robert Platt
Stefanie Tellex
LM&Ro
109
6
0
21 Jun 2024
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by
  Distilling Neural Fields and Foundation Model Features
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features
Letian Wang
Seung Wook Kim
Jiawei Yang
Cunjun Yu
Boris Ivanovic
Steven Waslander
Yue Wang
Sanja Fidler
Marco Pavone
Peter Karkus
100
9
0
17 Jun 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
174
4
0
17 Jun 2024
DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning
DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning
Zeyu Gao
Yao Mu
Jinye Qu
Mengkang Hu
Lingyue Guo
Ping Luo
Yanfeng Lu
Ping Luo
Shanghang Zhang
Yanfeng Lu
131
11
0
14 Jun 2024
LaMOT: Language-Guided Multi-Object Tracking
LaMOT: Language-Guided Multi-Object Tracking
Yunhao Li
Xiaoqiong Liu
Luke Liu
Heng Fan
Libo Zhang
VOT
80
3
0
12 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
152
53
0
11 Jun 2024
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang
Qifan Zhang
Yu-Wei Chao
Bowen Wen
Xiaohu Guo
Yu Xiang
3DH
156
2
0
10 Jun 2024
InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight
  Information Shaping
InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping
Yunchao Zhang
Guandao Yang
Leonidas Guibas
Yanchao Yang
3DGS
90
1
0
09 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
205
15
0
09 Jun 2024
Matching Anything by Segmenting Anything
Matching Anything by Segmenting Anything
Siyuan Li
Lei Ke
Martin Danelljan
Luigi Piccinelli
Mattia Segu
Luc Van Gool
Fisher Yu
VOS
109
27
0
06 Jun 2024
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed El Amine Boudjoghra
Angela Dai
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Fahad Shahbaz Khan
VLMISeg
190
6
0
04 Jun 2024
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Junhao Cheng
Xi Lu
Hanhui Li
Khun Loun Zai
Baiqiao Yin
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffMVGen
133
11
0
03 Jun 2024
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners
Zhi Zheng
Qian Feng
Hang Li
Alois C. Knoll
Jianxiang Feng
164
7
0
01 Jun 2024
On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
Selim Kuzucu
Kemal Oksuz
Jonathan Sadeghi
P. Dokania
89
5
0
30 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLMISeg
153
5
0
28 May 2024
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD
  Generalization and Open-Set OOD Detection
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
Lin Zhu
Yifeng Yang
Qinying Gu
Xinbing Wang
Cheng Zhou
Nanyang Ye
VLM
119
2
0
26 May 2024
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
Simon Damm
M. Laszkiewicz
Johannes Lederer
Asja Fischer
131
8
0
23 May 2024
PerSense: Personalized Instance Segmentation in Dense Images
PerSense: Personalized Instance Segmentation in Dense Images
Muhammad Ibraheem Siddiqui
Muhammad Umer Sheikh
Hassan Abid
Muhammad Haris Khan
VLM
129
0
0
22 May 2024
BiomedParse: a biomedical foundation model for image parsing of
  everything everywhere all at once
BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once
Theodore Zhao
Yu Gu
Jianwei Yang
Naoto Usuyama
Ho Hin Lee
...
B. Piening
Carlo Bifulco
Mu-Hsin Wei
Hoifung Poon
Sheng Wang
MedIm
103
28
0
21 May 2024
In The Wild Ellipse Parameter Estimation for Circular Dining Plates and
  Bowls
In The Wild Ellipse Parameter Estimation for Circular Dining Plates and Bowls
Akil Pathiranage
Chris Czarnecki
Yuhao Chen
Pengcheng Xi
Linlin Xu
Alexander Wong
30
0
0
12 May 2024
To Ask or Not To Ask: Human-in-the-loop Contextual Bandits with Applications in Robot-Assisted Feeding
To Ask or Not To Ask: Human-in-the-loop Contextual Bandits with Applications in Robot-Assisted Feeding
Rohan Banerjee
Rajat Kumar Jenamani
Sidharth Vasudev
Amal Nanavati
Katherine Dimitropoulou
Sarah Dean
Tapomayukh Bhattacharjee
223
2
0
11 May 2024
A Survey on Occupancy Perception for Autonomous Driving: The Information
  Fusion Perspective
A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective
Huaiyuan Xu
Junliang Chen
Shiyu Meng
Yi Wang
Lap-Pui Chau
3DPC
111
21
0
08 May 2024
Video Diffusion Models: A Survey
Video Diffusion Models: A Survey
Andrew Melnik
Michal Ljubljanac
Cong Lu
Qi Yan
Weiming Ren
Helge J. Ritter
VGen
150
16
0
06 May 2024
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
Junhao Cheng
Baiqiao Yin
Kaixin Cai
Minbin Huang
Hanhui Li
...
Yue Li
Yifei Li
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffMMLLM
138
13
0
29 Apr 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Yong A
Hongze Yu
...
Huaping Liu
Gang Hua
F. Sun
Jianwei Zhang
Bin Fang
AI4CELM&Ro
226
15
0
28 Apr 2024
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with
  Retrieval Guidelines
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
Xin Jiang
Hao Tang
Rui Yan
Jinhui Tang
Zechao Li
82
5
0
24 Apr 2024
A Multimodal Automated Interpretability Agent
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
223
28
0
22 Apr 2024
Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning
Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning
Rui Hu
Yahan Tu
Shuyu Wei
Dongyuan Lu
Jitao Sang
MLLM
62
0
0
16 Apr 2024
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
Hongxin Zhang
Zeyuan Wang
Qiushi Lyu
Zheyuan Zhang
Sunli Chen
Tianmin Shu
Yilun Du
Kwonjoon Lee
Yilun Du
Chuang Gan
169
18
0
16 Apr 2024
Unifying Global and Local Scene Entities Modelling for Precise Action
  Spotting
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
Kim Hoang Tran
Phuc Vuong Do
Ngoc Quoc Ly
Ngan Le
75
4
0
15 Apr 2024
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
Muer Tie
Julong Wei
Zhengjun Wang
Ke Wu
Shansuai Yuan
Kaizhao Zhang
Jie Jia
Jieru Zhao
Zhongxue Gan
Wenchao Ding
125
6
0
10 Apr 2024
Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models
Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models
Yutao Ouyang
Jinhan Li
Yunfei Li
Zhongyu Li
Chao Yu
Koushil Sreenath
Yi Wu
139
15
0
08 Apr 2024
DL-EWF: Deep Learning Empowering Women's Fashion with
  Grounded-Segment-Anything Segmentation for Body Shape Classification
DL-EWF: Deep Learning Empowering Women's Fashion with Grounded-Segment-Anything Segmentation for Body Shape Classification
Fatemeh Asghari
M. Soheili
Faeze Gholamrezaie
3DH
62
0
0
07 Apr 2024
Self-Training Large Language Models for Improved Visual Program
  Synthesis With Visual Reinforcement
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan
B. Vijaykumar
S. Schulter
Yun Fu
Manmohan Chandraker
LRMReLM
98
8
0
06 Apr 2024
Previous
123...11121314
Next