Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

11 June 2018

Papers citing "Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction"

15 / 15 papers shown

Title
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model Alaa Dalaq Muzammil Behzad VLM 163 0 0 25 May 2025
RAIDER: Tool-Equipped Large Language Model Agent for Robotic Action Issue Detection, Explanation and Recovery Silvia Izquierdo-Badiola Carlos Rizzo Guillem Alenyà LLMAG LM&Ro 141 0 0 22 Mar 2025
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models V. Bhat Prashanth Krishnamurthy Ramesh Karri Farshad Khorrami 97 5 0 16 Sep 2024
General-purpose Clothes Manipulation with Semantic Keypoints Yuhong Deng David Hsu 100 2 0 15 Aug 2024
Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions Jun Hatori Yuta Kikuchi Sosuke Kobayashi K. Takahashi Yuta Tsuboi Y. Unno W. Ko Jethro Tan 59 161 0 17 Oct 2017
A simple neural network module for relational reasoning Adam Santoro David Raposo David Barrett Mateusz Malinowski Razvan Pascanu Peter W. Battaglia Timothy Lillicrap GNN NAI 177 1,614 0 05 Jun 2017
Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics Jeffrey Mahler Jacky Liang Sherdil Niyaz Michael Laskey R. Doan Xinyu Liu J. A. Ojea Ken Goldberg 3DPC 3DV 100 1,267 0 27 Mar 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions Licheng Yu Hao Tan Joey Tianyi Zhou Tamara L. Berg ObjD 94 275 0 30 Dec 2016
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning Justin Johnson B. Hariharan Laurens van der Maaten Li Fei-Fei C. L. Zitnick Ross B. Girshick CoGe 295 2,375 0 20 Dec 2016
Modeling Relationships in Referential Expressions with Compositional Modular Networks Ronghang Hu Marcus Rohrbach Jacob Andreas Trevor Darrell Kate Saenko 75 406 0 30 Nov 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Ranjay Krishna Yuke Zhu Oliver Groth Justin Johnson Kenji Hata ... Yannis Kalantidis Li Li David A. Shamma Michael S. Bernstein Fei-Fei Li 215 5,743 0 23 Feb 2016
DenseCap: Fully Convolutional Localization Networks for Dense Captioning Justin Johnson A. Karpathy Li Fei-Fei VLM 129 1,169 0 24 Nov 2015
Natural Language Object Retrieval Ronghang Hu Huazhe Xu Marcus Rohrbach Jiashi Feng Kate Saenko Trevor Darrell ObjD 94 553 0 13 Nov 2015
Generation and Comprehension of Unambiguous Object Descriptions Junhua Mao Jonathan Huang Alexander Toshev Oana-Maria Camburu Alan Yuille Kevin Patrick Murphy ObjD 118 1,345 0 07 Nov 2015
A Joint Model of Language and Perception for Grounded Attribute Learning Cynthia Matuszek Nicholas FitzGerald Luke Zettlemoyer Liefeng Bo Dieter Fox LM&Ro 82 316 0 27 Jun 2012