Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.17104
Cited By
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
27 May 2024
Haoyu Zhao
Wenhang Ge
Ying-Cong Chen
ObjD
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding"
19 / 19 papers shown
Title
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
381
0
0
01 Dec 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu
Guandao Yang
Zhibing Li
Kai Zhang
Ziwei Liu
Leonidas Guibas
Dahua Lin
Gordon Wetzstein
EGVM
VGen
54
90
0
08 Jan 2024
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
Jianing Yang
Xuweiyi Chen
Shengyi Qian
Nikhil Madaan
Madhavan Iyengar
David Fouhey
Joyce Chai
LM&Ro
LLMAG
98
90
0
21 Sep 2023
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
58
67
0
17 Aug 2023
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Lewei Yao
Jianhua Han
Youpeng Wen
Xiaodan Liang
Dan Xu
Wei Zhang
Zhenguo Li
Chunjing Xu
Hang Xu
CLIP
VLM
131
156
0
20 Sep 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjD
VLM
73
299
0
12 Jun 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
136
1,399
0
07 Mar 2022
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Feng Li
Hao Zhang
Shi-guang Liu
Jian Guo
L. Ni
Lei Zhang
ViT
101
660
0
02 Mar 2022
Object Detection in Autonomous Vehicles: Status and Open Challenges
Abhishek Balasubramaniam
S. Pasricha
68
55
0
19 Jan 2022
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
106
571
0
16 Dec 2021
Conditional DETR for Fast Training Convergence
Depu Meng
Xiaokang Chen
Zejia Fan
Gang Zeng
Houqiang Li
Yuhui Yuan
Lei-huan Sun
Jingdong Wang
ViT
40
615
0
13 Aug 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
152
876
0
26 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
56
338
0
17 Apr 2021
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
164
4,993
0
08 Oct 2020
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
225
288
0
19 Mar 2020
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
59
158
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
43
361
0
18 Aug 2019
Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges
Di Feng
Christian Haase-Schuetz
Lars Rosenbaum
Heinz Hertlein
Claudius Gläser
Fabian Duffhauss
W. Wiesbeck
Klaus C. J. Dietmayer
3DPC
71
1,000
0
21 Feb 2019
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
412
61,900
0
04 Jun 2015
1