ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.07280
  4. Cited By
Vision-and-Language Navigation: Interpreting visually-grounded
  navigation instructions in real environments

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

20 November 2017
Peter Anderson
Qi Wu
Damien Teney
Jake Bruce
Mark Johnson
Niko Sünderhauf
Ian Reid
Stephen Gould
Anton Van Den Hengel
    LM&Ro
ArXivPDFHTML

Papers citing "Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments"

50 / 307 papers shown
Title
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
24
4
0
05 Jan 2023
Multimodal Sequential Generative Models for Semi-Supervised Language
  Instruction Following
Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following
K. Akuzawa
Yusuke Iwasawa
Yutaka Matsuo
GAN
33
0
0
29 Dec 2022
Benchmarking Spatial Relationships in Text-to-Image Generation
Benchmarking Spatial Relationships in Text-to-Image Generation
Tejas Gokhale
Hamid Palangi
Besmira Nushi
Vibhav Vineet
Eric Horvitz
Ece Kamar
Chitta Baral
Yezhou Yang
EGVM
45
66
0
20 Dec 2022
Continual Learning for Instruction Following from Realtime Feedback
Continual Learning for Instruction Following from Realtime Feedback
Alane Suhr
Yoav Artzi
26
17
0
19 Dec 2022
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with
  Visual Queries
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries
Jinjie Mai
Abdullah Hamdi
Silvio Giancola
Chen Zhao
Guohao Li
EgoV
38
14
0
14 Dec 2022
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large
  Language Models
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Chan Hee Song
Jiaman Wu
Clay Washington
Brian M Sadler
Wei-Lun Chao
Yu-Chuan Su
LLMAG
LM&Ro
36
383
0
08 Dec 2022
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
Hao Li
Yizhi Zhang
Junzhe Zhu
Shaoxiong Wang
Michelle A. Lee
Huazhe Xu
Edward H. Adelson
Li Fei-Fei
Ruohan Gao
Jiajun Wu
32
58
0
07 Dec 2022
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
57
309
0
06 Dec 2022
PEANUT: Predicting and Navigating to Unseen Targets
PEANUT: Predicting and Navigating to Unseen Targets
Albert J. Zhai
Shenlong Wang
24
19
0
05 Dec 2022
Navigating to Objects in the Real World
Navigating to Objects in the Real World
Théophile Gervet
Soumith Chintala
Dhruv Batra
Jitendra Malik
Devendra Singh Chaplot
41
122
0
02 Dec 2022
A General Purpose Supervisory Signal for Embodied Agents
A General Purpose Supervisory Signal for Embodied Agents
Kunal Pratap Singh
Jordi Salvador
Luca Weihs
Aniruddha Kembhavi
SSL
26
3
0
01 Dec 2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose
  Visual Representation
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang
William Zhu
Baoxiong Jia
Zan Wang
Xiaojian Ma
Qing Li
Siyuan Huang
37
5
0
28 Nov 2022
Predicting Topological Maps for Visual Navigation in Unexplored
  Environments
Predicting Topological Maps for Visual Navigation in Unexplored Environments
Huangying Zhan
Hamid Rezatofighi
Ian Reid
34
0
0
23 Nov 2022
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
Kunal Pratap Singh
Luca Weihs
Alvaro Herrasti
Jonghyun Choi
Aniruddha Kemhavi
Roozbeh Mottaghi
13
19
0
18 Nov 2022
Prompter: Utilizing Large Language Model Prompting for a Data Efficient
  Embodied Instruction Following
Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following
Y. Inoue
Hiroki Ohashi
LM&Ro
30
43
0
07 Nov 2022
lilGym: Natural Language Visual Reasoning with Reinforcement Learning
lilGym: Natural Language Visual Reasoning with Reinforcement Learning
Anne Wu
Kianté Brantley
Noriyuki Kojima
Yoav Artzi
ReLM
OffRL
LRM
27
3
0
03 Nov 2022
Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object
  Transport
Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object Transport
S. Narayanan
Dinesh Jayaraman
Manmohan Chandraker
24
1
0
28 Oct 2022
Bridging the visual gap in VLN via semantically richer instructions
Bridging the visual gap in VLN via semantically richer instructions
Joaquín Ossandón
Benjamín Earle
Alvaro Soto
35
0
0
27 Oct 2022
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in
  Interactive Autonomous Driving Agents
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents
Ziqiao Ma
B. VanDerPloeg
Cristian-Paul Bara
Yidong Huang
Eui-In Kim
Felix Gervits
M. Marge
J. Chai
63
7
0
22 Oct 2022
DANLI: Deliberative Agent for Following Natural Language Instructions
DANLI: Deliberative Agent for Following Natural Language Instructions
Yichi Zhang
Jianing Yang
Jiayi Pan
Shane Storks
N. Devraj
Ziqiao Ma
Keunwoo Peter Yu
Yuwei Bao
J. Chai
LM&Ro
52
16
0
22 Oct 2022
ULN: Towards Underspecified Vision-and-Language Navigation
ULN: Towards Underspecified Vision-and-Language Navigation
Weixi Feng
Tsu-jui Fu
Yujie Lu
William Yang Wang
49
5
0
18 Oct 2022
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
43
104
0
18 Oct 2022
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
Sudipta Paul
A. Roy-Chowdhury
A. Cherian
33
23
0
14 Oct 2022
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
  Navigation
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation
Peihao Chen
Dongyu Ji
Kun-Li Channing Lin
Runhao Zeng
Thomas H. Li
Mingkui Tan
Chuang Gan
SSL
36
62
0
14 Oct 2022
Learning Active Camera for Multi-Object Navigation
Learning Active Camera for Multi-Object Navigation
Peihao Chen
Dongyu Ji
Kun-Li Channing Lin
Weiwen Hu
Wenbing Huang
Thomas H. Li
Ming Tan
Chuang Gan
33
24
0
14 Oct 2022
Transformer-based Localization from Embodied Dialog with Large-scale
  Pre-training
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training
Meera Hahn
James M. Rehg
LM&Ro
40
4
0
10 Oct 2022
Learning a Visually Grounded Memory Assistant
Learning a Visually Grounded Memory Assistant
Meera Hahn
Kevin Carlberg
Ruta Desai
James M. Hillis
27
1
0
07 Oct 2022
A New Path: Scaling Vision-and-Language Navigation with Synthetic
  Instructions and Imitation Learning
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Aishwarya Kamath
Peter Anderson
Su Wang
Jing Yu Koh
Alexander Ku
Austin Waters
Yinfei Yang
Jason Baldridge
Zarana Parekh
LM&Ro
22
45
0
06 Oct 2022
Iterative Vision-and-Language Navigation
Iterative Vision-and-Language Navigation
Jacob Krantz
Shurjo Banerjee
Wang Zhu
Jason J. Corso
Peter Anderson
Stefan Lee
Jesse Thomason
LM&Ro
40
18
0
06 Oct 2022
LOViS: Learning Orientation and Visual Signals for Vision and Language
  Navigation
LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation
Yue Zhang
Parisa Kordjamshidi
33
11
0
26 Sep 2022
Towards Explainable 3D Grounded Visual Question Answering: A New
  Benchmark and Strong Baseline
Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline
Lichen Zhao
Daigang Cai
Jing Zhang
Lu Sheng
Dong Xu
Ruizhi Zheng
Yinjie Zhao
Lipeng Wang
Xibo Fan
6
23
0
24 Sep 2022
Ground then Navigate: Language-guided Navigation in Dynamic Scenes
Ground then Navigate: Language-guided Navigation in Dynamic Scenes
Kanishk Jain
Varun Chhangani
Amogh Tiwari
K. M. Krishna
Vineet Gandhi
LM&Ro
18
27
0
24 Sep 2022
PACT: Perception-Action Causal Transformer for Autoregressive Robotics
  Pre-Training
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training
Rogerio Bonatti
Sai H. Vemprala
Shuang Ma
Felipe Vieira Frujeri
Shuhang Chen
Ashish Kapoor
33
22
0
22 Sep 2022
Anticipating the Unseen Discrepancy for Vision and Language Navigation
Anticipating the Unseen Discrepancy for Vision and Language Navigation
Yujie Lu
Huiliang Zhang
Ping Nie
Weixi Feng
Wenda Xu
Qing Guo
William Yang Wang
35
1
0
10 Sep 2022
LATTE: LAnguage Trajectory TransformEr
LATTE: LAnguage Trajectory TransformEr
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Sai H. Vemprala
Rogerio Bonatti
LM&Ro
39
59
0
04 Aug 2022
Equivariant and Invariant Grounding for Video Question Answering
Equivariant and Invariant Grounding for Video Question Answering
Yicong Li
Xiang Wang
Junbin Xiao
Tat-Seng Chua
20
25
0
26 Jul 2022
TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
Gabriel H. Sarch
Zhaoyuan Fang
Adam W. Harley
Paul Schydlo
Michael J. Tarr
Saurabh Gupta
Katerina Fragkiadaki
LM&Ro
23
45
0
21 Jul 2022
Target-Driven Structured Transformer Planner for Vision-Language
  Navigation
Target-Driven Structured Transformer Planner for Vision-Language Navigation
Yusheng Zhao
Jinyu Chen
Chen Gao
Wenguan Wang
Lirong Yang
Haibing Ren
Huaxia Xia
Si Liu
LM&Ro
27
57
0
19 Jul 2022
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Shailaja Keyur Sampat
Maitreya Patel
Subhasish Das
Yezhou Yang
Chitta Baral
ReLM
LM&Ro
LRM
24
12
0
15 Jul 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
  Vision, and Action
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
158
436
0
10 Jul 2022
CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
  Environment-Agnostic Representations
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations
Jialu Li
Hao Tan
Joey Tianyi Zhou
LM&Ro
64
12
0
05 Jul 2022
Leveraging Language for Accelerated Learning of Tool Manipulation
Leveraging Language for Accelerated Learning of Tool Manipulation
Allen Z. Ren
Bharat Govil
Tsung-Yen Yang
Karthik Narasimhan
Anirudha Majumdar
LM&Ro
22
37
0
27 Jun 2022
Good Time to Ask: A Learning Framework for Asking for Help in Embodied
  Visual Navigation
Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation
Jenny Zhang
Samson Yu
Jiafei Duan
Cheston Tan
31
4
0
20 Jun 2022
Local Slot Attention for Vision-and-Language Navigation
Local Slot Attention for Vision-and-Language Navigation
Yifeng Zhuang
Qiang Sun
Yanwei Fu
Lifeng Chen
Xiangyang Xue
21
2
0
17 Jun 2022
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Kai Zheng
Xiaotong Chen
Odest Chadwicke Jenkins
Qing Guo
LM&Ro
CoGe
21
54
0
17 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
33
79
0
16 Jun 2022
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
Zi-Yi Dou
Nanyun Peng
24
22
0
09 Jun 2022
GraphMapper: Efficient Visual Navigation by Scene Graph Generation
GraphMapper: Efficient Visual Navigation by Scene Graph Generation
Zachary Seymour
Niluthpol Chowdhury Mithun
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
24
8
0
17 May 2022
Embodied Navigation at the Art Gallery
Embodied Navigation at the Art Gallery
Roberto Bigazzi
Federico Landi
S. Cascianelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
21
3
0
19 Apr 2022
Habitat-Web: Learning Embodied Object-Search Strategies from Human
  Demonstrations at Scale
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
Ram Ramrakhya
Eric Undersander
Dhruv Batra
Abhishek Das
LM&Ro
29
109
0
07 Apr 2022
Previous
1234567
Next