ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.03112
  4. Cited By
A New Path: Scaling Vision-and-Language Navigation with Synthetic
  Instructions and Imitation Learning

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

6 October 2022
Aishwarya Kamath
Peter Anderson
Su Wang
Jing Yu Koh
Alexander Ku
Austin Waters
Yinfei Yang
Jason Baldridge
Zarana Parekh
    LM&Ro
ArXivPDFHTML

Papers citing "A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning"

50 / 67 papers shown
Title
TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
Navid Rajabi
Jana Kosecka
LM&Ro
3DV
93
0
0
11 Feb 2025
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
Hao Sha
Yao Mu
Yuxuan Jiang
Li Chen
Chenfeng Xu
Ping Luo
Shengbo Eben Li
Masayoshi Tomizuka
Wei Zhan
Mingyu Ding
195
170
0
04 Oct 2023
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
61
704
0
14 Sep 2022
CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
  Environment-Agnostic Representations
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations
Jialu Li
Hao Tan
Joey Tianyi Zhou
LM&Ro
75
12
0
05 Jul 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
158
1,089
0
22 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
79
125
0
15 Jun 2022
Habitat-Web: Learning Embodied Object-Search Strategies from Human
  Demonstrations at Scale
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
Ram Ramrakhya
Eric Undersander
Dhruv Batra
Abhishek Das
LM&Ro
87
113
0
07 Apr 2022
Simple and Effective Synthesis of Indoor 3D Scenes
Simple and Effective Synthesis of Indoor 3D Scenes
Jing Yu Koh
Harsh Agrawal
Dhruv Batra
Richard Tucker
Austin Waters
Honglak Lee
Yinfei Yang
Jason Baldridge
Peter Anderson
VGen
3DV
74
30
0
06 Apr 2022
EnvEdit: Environment Editing for Vision-and-Language Navigation
EnvEdit: Environment Editing for Vision-and-Language Navigation
Jialu Li
Hao Tan
Joey Tianyi Zhou
78
81
0
29 Mar 2022
Think Global, Act Local: Dual-scale Graph Transformer for
  Vision-and-Language Navigation
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
LM&Ro
61
141
0
23 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
120
859
0
07 Feb 2022
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
62
1,047
0
07 Dec 2021
Less is More: Generating Grounded Navigation Instructions from Landmarks
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Ceslee Montgomery
Jordi Orbay
Vighnesh Birodkar
Aleksandra Faust
Izzeddin Gur
Natasha Jaques
Austin Waters
Jason Baldridge
Peter Anderson
87
63
0
25 Nov 2021
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language
  Navigation
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
A. Moudgil
Arjun Majumdar
Harsh Agrawal
Stefan Lee
Dhruv Batra
LM&Ro
42
58
0
27 Oct 2021
History Aware Multimodal Transformer for Vision-and-Language Navigation
History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen
Pierre-Louis Guhur
Cordelia Schmid
Ivan Laptev
LM&Ro
45
228
0
25 Oct 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
99
789
0
24 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
31
136
0
20 Aug 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
242
407
0
13 Jul 2021
Vision-Language Navigation with Random Environmental Mixup
Vision-Language Navigation with Random Environmental Mixup
Chong Liu
Fengda Zhu
Xiaojun Chang
Xiaodan Liang
Zongyuan Ge
Yi-Dong Shen
LM&Ro
70
86
0
15 Jun 2021
Pathdreamer: A World Model for Indoor Navigation
Pathdreamer: A World Model for Indoor Navigation
Jing Yu Koh
Honglak Lee
Yinfei Yang
Jason Baldridge
Peter Anderson
68
83
0
18 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
150
876
0
26 Apr 2021
Diagnosing Vision-and-Language Navigation: What Really Matters
Diagnosing Vision-and-Language Navigation: What Really Matters
Wanrong Zhu
Yuankai Qi
P. Narayana
Kazoo Sone
Sugato Basu
Xinze Wang
Qi Wu
Miguel P. Eckstein
Wenjie Wang
LM&Ro
52
50
0
30 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
671
28,659
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
319
4,873
0
24 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
96
651
0
11 Feb 2021
On the Evaluation of Vision-and-Language Navigation Instructions
On the Evaluation of Vision-and-Language Navigation Instructions
Mingde Zhao
Peter Anderson
Vihan Jain
Su Wang
Alexander Ku
Jason Baldridge
Eugene Ie
270
52
0
26 Jan 2021
A Recurrent Vision-and-Language BERT for Navigation
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong
Qi Wu
Yuankai Qi
Cristian Rodriguez-Opazo
Stephen Gould
LM&Ro
84
299
0
26 Nov 2020
Where Are You? Localization from Embodied Dialog
Where Are You? Localization from Embodied Dialog
Meera Hahn
Jacob Krantz
Dhruv Batra
Devi Parikh
James M. Rehg
Stefan Lee
Peter Anderson
LM&Ro
36
27
0
16 Nov 2020
Sim-to-Real Transfer for Vision-and-Language Navigation
Sim-to-Real Transfer for Vision-and-Language Navigation
Peter Anderson
Ayush Shrivastava
Joanne Truong
Arjun Majumdar
Devi Parikh
Dhruv Batra
Stefan Lee
LM&Ro
62
106
0
07 Nov 2020
mT5: A massively multilingual pre-trained text-to-text transformer
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
91
2,489
0
22 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
342
40,217
0
22 Oct 2020
Language and Visual Entity Relationship Graph for Agent Navigation
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Yuankai Qi
Qi Wu
Stephen Gould
LM&Ro
215
132
0
19 Oct 2020
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense
  Spatiotemporal Grounding
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
Alexander Ku
Peter Anderson
Roma Patel
Eugene Ie
Jason Baldridge
62
305
0
15 Oct 2020
Diagnosing the Environment Bias in Vision-and-Language Navigation
Diagnosing the Environment Bias in Vision-and-Language Navigation
Yubo Zhang
Hao Tan
Joey Tianyi Zhou
39
55
0
06 May 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the
  Web
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
115
231
0
30 Apr 2020
Experience Grounds Language
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
62
354
0
21 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
70
1,927
0
13 Apr 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via
  Pre-training
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
47
275
0
25 Feb 2020
Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for
  Language Grounding Tasks in Street View
Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View
Harsh Mehta
Yoav Artzi
Jason Baldridge
Eugene Ie
Piotr Wojciech Mirowski
31
25
0
10 Jan 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
260
19,824
0
23 Oct 2019
Robust Navigation with Language Pretraining and Stochastic Sampling
Robust Navigation with Language Pretraining and Stochastic Sampling
Xiujun Li
Chunyuan Li
Qiaolin Xia
Yonatan Bisk
Asli Celikyilmaz
Jianfeng Gao
Noah A. Smith
Yejin Choi
LM&Ro
106
111
0
05 Sep 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
198
2,467
0
20 Aug 2019
Vision-and-Dialog Navigation
Vision-and-Dialog Navigation
Jesse Thomason
Michael Murray
Maya Cakmak
Luke Zettlemoyer
LM&Ro
86
325
0
10 Jul 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
87
17,950
0
28 May 2019
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor
  Environments
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
Yuankai Qi
Qi Wu
Peter Anderson
Xinze Wang
Wenjie Wang
Chunhua Shen
Anton Van Den Hengel
LM&Ro
76
320
0
23 Apr 2019
Learning to Navigate Unseen Environments: Back Translation with
  Environmental Dropout
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Hao Tan
Licheng Yu
Joey Tianyi Zhou
SSL
53
317
0
08 Apr 2019
Habitat: A Platform for Embodied AI Research
Habitat: A Platform for Embodied AI Research
Manolis Savva
Abhishek Kadian
Oleksandr Maksymets
Yili Zhao
Erik Wijmans
...
Jia-Wei Liu
V. Koltun
Jitendra Malik
Devi Parikh
Dhruv Batra
LM&Ro
85
1,390
0
02 Apr 2019
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Chih-Yao Ma
Jiasen Lu
Zuxuan Wu
G. Al-Regib
Z. Kira
R. Socher
Caiming Xiong
LM&Ro
53
275
0
10 Jan 2019
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual
  Street Environments
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
Howard Chen
Alane Suhr
Dipendra Kumar Misra
Noah Snavely
Yoav Artzi
68
384
0
29 Nov 2018
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning
  for Vision-Language Navigation
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Xin Eric Wang
Qiuyuan Huang
Asli Celikyilmaz
Jianfeng Gao
Dinghan Shen
Yuan-fang Wang
William Yang Wang
Lei Zhang
LM&Ro
SSL
63
534
0
25 Nov 2018
12
Next