v1v2 (latest)

TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding

5 August 2021

Papers citing "TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding"

36 / 36 papers shown

Title
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding Feng Xiao Hongbin Xu Guocan Zhao Wenxiong Kang 205 0 0 07 May 2025
CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework Yanlong Xu Haoxuan Qu Qingbin Liu Wenxiao Zhang Xun Yang 399 0 0 04 Mar 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Zhangyang Qi Zhixiong Zhang Ye Fang Jiaqi Wang Hengshuang Zhao 179 13 0 02 Jan 2025
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models V. Bhat Prashanth Krishnamurthy Ramesh Karri Farshad Khorrami 108 5 0 16 Sep 2024
Data-Efficient 3D Visual Grounding via Order-Aware Referring Tung-Yu Wu Sheng-Yu Huang Yu-Chiang Frank Wang 103 0 0 25 Mar 2024
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment Xiaoxu Xu Yitian Yuan Qiudan Zhang Wen-Bin Wu Zequn Jie Lin Ma Xu Wang 100 4 0 15 Dec 2023
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring Zhihao Yuan Xu Yan Yinghong Liao Ruimao Zhang Sheng Wang Zhen Li Shuguang Cui 95 135 0 01 Mar 2021
Pre-Trained Image Processing Transformer Hanting Chen Yunhe Wang Tianyu Guo Chang Xu Yiping Deng Zhenhua Liu Siwei Ma Chunjing Xu Chao Xu Wen Gao VLM ViT 143 1,677 0 01 Dec 2020
Point Transformer Nico Engel Vasileios Belagiannis Klaus C. J. Dietmayer 3DPC 181 1,994 0 02 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 664 41,103 0 22 Oct 2020
Referring Image Segmentation via Cross-Modal Progressive Comprehension Shaofei Huang Tianrui Hui Si Liu Guanbin Li Yunchao Wei Jizhong Han Luoqi Liu Yue Liu EgoV 78 183 0 01 Oct 2020
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 421 13,048 0 26 May 2020
Understanding the Difficulty of Training Transformers Liyuan Liu Xiaodong Liu Jianfeng Gao Weizhu Chen Jiawei Han AI4CE 60 256 0 17 Apr 2020
Graph Structured Network for Image-Text Matching Chunxiao Liu Zhendong Mao Tianzhu Zhang Hongtao Xie Bin Wang Yongdong Zhang 76 236 0 01 Apr 2020
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language Dave Zhenyu Chen Angel X. Chang Matthias Nießner 3DPC 89 370 0 18 Dec 2019
Meshed-Memory Transformer for Image Captioning Marcella Cornia Matteo Stefanini Lorenzo Baraldi Rita Cucchiara 78 882 0 17 Dec 2019
Graph Transformer Networks Seongjun Yun Minbyul Jeong Raehyun Kim Jaewoo Kang Hyunwoo J. Kim 131 979 0 06 Nov 2019
Visual Semantic Reasoning for Image-Text Matching Kunpeng Li Yulun Zhang Keqin Li Yuanyuan Li Y. Fu VLM 87 504 0 06 Sep 2019
Attention on Attention for Image Captioning Lun Huang Wenmin Wang Jie Chen Xiao-Yong Wei 67 832 0 19 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding Zhengyuan Yang Boqing Gong Liwei Wang Wenbing Huang Dong Yu Jiebo Luo ObjD 56 362 0 18 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 665 24,528 0 26 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering Zhou Yu Jun Yu Yuhao Cui Dacheng Tao Q. Tian 87 806 0 25 Jun 2019
VoteNet: A Deep Learning Label Fusion Method for Multi-Atlas Segmentation Zhipeng Ding Xu Han Marc Niethammer 83 96 0 18 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,114 0 11 Oct 2018
Neural Motifs: Scene Graph Parsing with Global Context Rowan Zellers Mark Yatskar Sam Thomson Yejin Choi GNN 86 999 0 17 Nov 2017
Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering Zhou Yu Jun-chen Yu Chenchao Xiang Jianping Fan Dacheng Tao 61 460 0 10 Aug 2017
Query-guided Regression Network with Context Policy for Phrase Grounding Kan Chen Rama Kovvuri Ram Nevatia 65 142 0 04 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould Lei Zhang AIMat 121 4,220 0 25 Jul 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 713 132,199 0 12 Jun 2017
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space C. Qi L. Yi Hao Su Leonidas Guibas 3DPC 3DV 354 11,113 0 07 Jun 2017
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes Angela Dai Angel X. Chang Manolis Savva Maciej Halber Thomas Funkhouser Matthias Nießner 3DPC 3DV 481 4,062 0 14 Feb 2017
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation C. Qi Hao Su Kaichun Mo Leonidas Guibas 3DH 3DPC 3DV PINN 493 14,320 0 02 Dec 2016
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 413 10,494 0 21 Jul 2016
Adversarial Feature Learning Jiasen Lu Philipp Krahenbuhl Trevor Darrell GAN 113 1,611 0 31 May 2016
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.9K 150,260 0 22 Dec 2014
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping A. Karpathy Armand Joulin Li Fei-Fei VLM 101 937 0 22 Jun 2014