Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.02999
Cited By
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
6 September 2023
Sijin Chen
Erik Cambria
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning"
11 / 11 papers shown
Title
RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation
Naman Patel
Prashanth Krishnamurthy
Farshad Khorrami
27
0
0
21 May 2025
DC-Scene: Data-Centric Learning for 3D Scene Understanding
Ting Huang
Zeyu Zhang
Ruicheng Zhang
Yang Zhao
22
0
0
21 May 2025
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLM
LRM
64
0
0
08 May 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
59
7
0
21 Feb 2025
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs
Venkata Naren Devarakonda
Raktim Gautam Goswami
Ali Umut Kaypak
Naman Patel
Rooholla Khorrambakht
Prashanth Krishnamurthy
Farshad Khorrami
LM&Ro
59
5
0
08 Oct 2024
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
MLLM
46
85
0
30 Nov 2023
Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR
Feng Li
Ailing Zeng
Siyi Liu
Hao Zhang
Hongyang Li
Lei Zhang
L. Ni
ViT
51
68
0
13 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
324
4,362
0
30 Jan 2023
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
44
15
0
08 Oct 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
67
63
0
29 Sep 2022
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
Adam Paszke
Abhishek Chaurasia
Sangpil Kim
Eugenio Culurciello
SSeg
238
2,063
0
07 Jun 2016
1