Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.10638
Cited By
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
25 February 2020
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training"
40 / 40 papers shown
Title
VISTA: Generative Visual Imagination for Vision-and-Language Navigation
Yanjia Huang
Mingyang Wu
Renjie Li
Zhengzhong Tu
LM&Ro
56
0
0
09 May 2025
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
Junrong Yue
Yanzhe Zhang
Chuan Qin
Jing Chen
Xiaomin Lie
Xinlei Yu
Wenxin Zhang
Zhendong Zhao
79
1
0
23 Apr 2025
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard
Yifei Dong
Fengyi Wu
Qi He
Heng Li
Minghan Li
...
Yuxuan Zhou
Jingdong Sun
Qi Dai
Zhi-Qi Cheng
Alexander G. Hauptmann
LM&Ro
67
0
0
18 Mar 2025
Q-VLM: Post-training Quantization for Large Vision-Language Models
Changyuan Wang
Ziwei Wang
Xiuwei Xu
Yansong Tang
Jie Zhou
Jiwen Lu
MQ
39
4
0
10 Oct 2024
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Yanyuan Qiao
Wenqi Lyu
Hui Wang
Zixu Wang
Zerui Li
Yuan Zhang
Mingkui Tan
Qi Wu
LRM
57
4
0
27 Sep 2024
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Bingqian Lin
Yunshuang Nie
Ziming Wei
Jiaqi Chen
Shikui Ma
Jianhua Han
Hang Xu
Xiaojun Chang
Xiaodan Liang
LM&Ro
LRM
73
22
0
12 Mar 2024
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
313
930
0
24 Sep 2019
Robust Navigation with Language Pretraining and Stochastic Sampling
Xiujun Li
Chunyuan Li
Qiaolin Xia
Yonatan Bisk
Asli Celikyilmaz
Jianfeng Gao
Noah A. Smith
Yejin Choi
LM&Ro
95
111
0
05 Sep 2019
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
Khanh Nguyen
Hal Daumé
LM&Ro
EgoV
190
151
0
04 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
97
1,657
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
177
2,467
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
172
898
0
16 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
160
3,659
0
06 Aug 2019
Vision-and-Dialog Navigation
Jesse Thomason
Michael Murray
Maya Cakmak
Luke Zettlemoyer
LM&Ro
77
325
0
10 Jul 2019
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Hao Tan
Licheng Yu
Joey Tianyi Zhou
SSL
49
317
0
08 Apr 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
26
1,238
0
03 Apr 2019
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
Liyiming Ke
Xiujun Li
Yonatan Bisk
Ari Holtzman
Zhe Gan
Jingjing Liu
Jianfeng Gao
Yejin Choi
S. Srinivasa
54
168
0
06 Mar 2019
The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
Chih-Yao Ma
Zuxuan Wu
G. Al-Regib
Caiming Xiong
Z. Kira
LM&Ro
45
173
0
05 Mar 2019
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Chih-Yao Ma
Jiasen Lu
Zuxuan Wu
G. Al-Regib
Z. Kira
R. Socher
Caiming Xiong
LM&Ro
26
275
0
10 Jan 2019
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
Howard Chen
Alane Suhr
Dipendra Kumar Misra
Noah Snavely
Yoav Artzi
58
384
0
29 Nov 2018
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Xin Eric Wang
Qiuyuan Huang
Asli Celikyilmaz
Jianfeng Gao
Dinghan Shen
Yuan-fang Wang
William Yang Wang
Lei Zhang
LM&Ro
SSL
49
534
0
25 Nov 2018
Shifting the Baseline: Single Modality Performance on Visual Navigation & QA
Jesse Thomason
Daniel Gordon
Yonatan Bisk
49
75
0
01 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
603
93,936
0
11 Oct 2018
On Evaluation of Embodied Navigation Agents
Peter Anderson
Angel X. Chang
Devendra Singh Chaplot
Alexey Dosovitskiy
Saurabh Gupta
...
Jana Kosecka
Jitendra Malik
Roozbeh Mottaghi
Manolis Savva
Amir Zamir
83
791
0
18 Jul 2018
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&Ro
LRM
299
499
0
07 Jun 2018
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation
Xin Eric Wang
Wenhan Xiong
Hongmin Wang
William Yang Wang
49
200
0
21 Mar 2018
AI2-THOR: An Interactive 3D Environment for Visual AI
Eric Kolve
Roozbeh Mottaghi
Winson Han
Eli VanderBilt
Luca Weihs
...
Daniel Gordon
Yuke Zhu
Aniruddha Kembhavi
Abhinav Gupta
Ali Farhadi
LM&Ro
27
1,091
0
14 Dec 2017
MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
Manolis Savva
Angel X. Chang
Alexey Dosovitskiy
Thomas Funkhouser
V. Koltun
48
247
0
11 Dec 2017
Embodied Question Answering
Abhishek Das
Samyak Datta
Georgia Gkioxari
Stefan Lee
Devi Parikh
Dhruv Batra
LM&Ro
57
642
0
30 Nov 2017
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Peter Anderson
Qi Wu
Damien Teney
Jake Bruce
Mark Johnson
Niko Sünderhauf
Ian Reid
Stephen Gould
Anton Van Den Hengel
LM&Ro
65
1,299
0
20 Nov 2017
Matterport3D: Learning from RGB-D Data in Indoor Environments
Angel X. Chang
Angela Dai
Thomas Funkhouser
Maciej Halber
Matthias Nießner
Manolis Savva
Shuran Song
Andy Zeng
Yinda Zhang
3DV
3DPC
97
1,880
0
18 Sep 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
88
4,201
0
25 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
230
129,831
0
12 Jun 2017
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Dipendra Kumar Misra
John Langford
Yoav Artzi
49
247
0
28 Apr 2017
Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding
Will Monroe
Robert D. Hawkins
Noah D. Goodman
Christopher Potts
51
122
0
29 Mar 2017
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
165
10,412
0
21 Jul 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
819
192,638
0
10 Dec 2015
Fast R-CNN
Ross B. Girshick
ObjD
225
24,933
0
30 Apr 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
262
149,474
0
22 Dec 2014
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
211
20,467
0
10 Sep 2014
1