ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.15643
  4. Cited By
Multimodal Perception for Goal-oriented Navigation: A Survey

Multimodal Perception for Goal-oriented Navigation: A Survey

22 April 2025
I-Tak Ieong
Hao Tang
    LM&RoLRM
ArXiv (abs)PDFHTML

Papers citing "Multimodal Perception for Goal-oriented Navigation: A Survey"

50 / 111 papers shown
Title
Enhancing Exploratory Capability of Visual Navigation Using Uncertainty
  of Implicit Scene Representation
Enhancing Exploratory Capability of Visual Navigation Using Uncertainty of Implicit Scene Representation
Yanjie Wang
Qiming Liu
Zhe Liu
Hesheng Wang
68
1
0
05 Nov 2024
Diffusion as Reasoning: Enhancing Object Navigation via Diffusion Model Conditioned on LLM-based Object-Room Knowledge
Diffusion as Reasoning: Enhancing Object Navigation via Diffusion Model Conditioned on LLM-based Object-Room Knowledge
Yiming Ji
Kaijie Yun
Yang Liu
Zhengpu Wang
Boyu Ma
Zongwu Xie
Hong Liu
DiffM
34
1
0
29 Oct 2024
Object Instance Retrieval in Assistive Robotics: Leveraging Fine-Tuned
  SimSiam with Multi-View Images Based on 3D Semantic Map
Object Instance Retrieval in Assistive Robotics: Leveraging Fine-Tuned SimSiam with Multi-View Images Based on 3D Semantic Map
Taichi Sakaguchi
Akira Taniguchi
Y. Hagiwara
Lotfi El Hafi
Shoichi Hasegawa
T. Taniguchi
64
4
0
15 Apr 2024
MemoNav: Working Memory Model for Visual Navigation
MemoNav: Working Memory Model for Visual Navigation
Hongxin Li
Zeyu Wang
Xueke Yang
Yu-Ren Yang
Shuqi Mei
Zhaoxiang Zhang
93
5
0
29 Feb 2024
Aligning Knowledge Graph with Visual Perception for Object-goal
  Navigation
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation
Nuo Xu
Wen Wang
Rong Yang
Mengjie Qin
Zheyuan Lin
Wei Song
Chunlong Zhang
J. Gu
Chao Li
86
9
0
29 Feb 2024
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via
  Vision-Language Foundation Models
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
Yuxuan Kuang
Hai Lin
Meng Jiang
LM&Ro
85
33
0
16 Feb 2024
GOAT: GO to Any Thing
GOAT: GO to Any Thing
Matthew Chang
Théophile Gervet
Mukul Khanna
Sriram Yenamandra
Dhruv Shah
...
Saurabh Gupta
Dhruv Batra
Roozbeh Mottaghi
Jitendra Malik
Devendra Singh Chaplot
88
74
0
10 Nov 2023
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration
A. Sridhar
Dhruv Shah
Catherine Glossop
Sergey Levine
110
127
0
11 Oct 2023
FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation
FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation
Xinyu Sun
Peihao Chen
Jugang Fan
Thomas H. Li
Jian Chen
Mingkui Tan
71
14
0
11 Oct 2023
Omnidirectional Information Gathering for Knowledge Transfer-based
  Audio-Visual Navigation
Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation
Jinyu Chen
Wenguan Wang
Siying Liu
Hongsheng Li
Yi Yang
86
8
0
20 Aug 2023
Scaling Open-Vocabulary Object Detection
Scaling Open-Vocabulary Object Detection
Matthias Minderer
A. Gritsenko
N. Houlsby
VLMObjD
103
201
0
16 Jun 2023
CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual
  Navigation in Noisy Environments
CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments
Xiulong Liu
Sudipta Paul
Moitreya Chatterjee
A. Cherian
71
9
0
06 Jun 2023
L3MVN: Leveraging Large Language Models for Visual Target Navigation
L3MVN: Leveraging Large Language Models for Visual Target Navigation
Bangguo Yu
Hamidreza Kasaei
M. Cao
LM&Ro
94
101
0
11 Apr 2023
ENTL: Embodied Navigation Trajectory Learner
ENTL: Embodied Navigation Trajectory Learner
Klemen Kotar
Aaron Walsman
Roozbeh Mottaghi
66
7
0
05 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIPVLM
257
1,200
0
27 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,761
0
15 Mar 2023
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav
Karmesh Yadav
Arjun Majumdar
Ram Ramrakhya
Naoki Yokoyama
Alexei Baevski
Z. Kira
Oleksandr Maksymets
Dhruv Batra
ViT
94
48
0
14 Mar 2023
Audio Visual Language Maps for Robot Navigation
Audio Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
VGen
112
36
0
13 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
191
2,023
0
09 Mar 2023
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi
Zhenjia Xu
S. Feng
Eric A. Cousineau
Yilun Du
Benjamin Burchfiel
Russ Tedrake
Shuran Song
349
1,242
0
07 Mar 2023
Renderable Neural Radiance Map for Visual Navigation
Renderable Neural Radiance Map for Visual Navigation
Obin Kwon
Jeongho Park
Songhwai Oh
95
55
0
01 Mar 2023
ConceptFusion: Open-set Multimodal 3D Mapping
ConceptFusion: Open-set Multimodal 3D Mapping
Krishna Murthy Jatavallabhula
Ali Kuwajerwala
Qiao Gu
Mohd. Omama
Tao Chen
...
Celso Miguel de Melo
Madhava Krishna
Liam Paull
Florian Shkurti
Antonio Torralba
83
246
0
14 Feb 2023
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object
  Navigation
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
KAI-QING Zhou
Kai Zheng
Connor Pryor
Yilin Shen
Hongxia Jin
Lise Getoor
Xinze Wang
104
118
0
30 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
432
4,656
0
30 Jan 2023
PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav
PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav
Ram Ramrakhya
Dhruv Batra
Erik Wijmans
Abhishek Das
OffRL
141
61
0
18 Jan 2023
PEANUT: Predicting and Navigating to Unseen Targets
PEANUT: Predicting and Navigating to Unseen Targets
Albert J. Zhai
Shenlong Wang
73
23
0
05 Dec 2022
3D-Aware Object Goal Navigation via Simultaneous Exploration and
  Identification
3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
JIazhao Zhang
Liu Dai
Fanpeng Meng
Qingnan Fan
Xuelin Chen
Kai Xu
He Wang
3DPC
90
40
0
01 Dec 2022
Last-Mile Embodied Visual Navigation
Last-Mile Embodied Visual Navigation
Justin Wasserman
Karmesh Yadav
Girish Chowdhary
Abhi Gupta
Unnat Jain
102
34
0
21 Nov 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View
  Completion
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
121
71
0
19 Oct 2022
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
Sudipta Paul
Amit K. Roy-Chowdhury
A. Cherian
63
25
0
14 Oct 2022
DreamFusion: Text-to-3D using 2D Diffusion
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole
Ajay Jain
Jonathan T. Barron
B. Mildenhall
177
2,439
0
29 Sep 2022
Topological Semantic Graph Memory for Image-Goal Navigation
Topological Semantic Graph Memory for Image-Goal Navigation
Nuri Kim
Obin Kwon
Hwiyeon Yoo
Yunho Choi
Jeongho Park
Songhwai Oh
104
53
0
17 Sep 2022
Unsupervised Visual Representation Learning by Synchronous Momentum
  Grouping
Unsupervised Visual Representation Learning by Synchronous Momentum Grouping
Bo Pang
Yifan Zhang
Yaoyi Li
Jia Cai
Cewu Lu
SSL
65
28
0
13 Jul 2022
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for
  real-time object detectors
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Chien-Yao Wang
Alexey Bochkovskiy
H. Liao
ObjD
170
6,571
0
06 Jul 2022
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
Arjun Majumdar
Gunjan Aggarwal
Bhavika Devnani
Judy Hoffman
Dhruv Batra
LM&Ro
206
163
0
24 Jun 2022
What do navigation agents learn about their environment?
What do navigation agents learn about their environment?
Kshitij Dwivedi
Gemma Roig
Aniruddha Kembhavi
Roozbeh Mottaghi
73
12
0
17 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
91
86
0
16 Jun 2022
Is Mapping Necessary for Realistic PointGoal Navigation?
Is Mapping Necessary for Realistic PointGoal Navigation?
Ruslan Partsey
Erik Wijmans
Naoki Yokoyama
Oles Dobosevych
Dhruv Batra
Oleksandr Maksymets
3DPC
69
44
0
02 Jun 2022
Elucidating the Design Space of Diffusion-Based Generative Models
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras
M. Aittala
Timo Aila
S. Laine
DiffM
225
2,033
0
01 Jun 2022
Habitat-Web: Learning Embodied Object-Search Strategies from Human
  Demonstrations at Scale
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
Ram Ramrakhya
Eric Undersander
Dhruv Batra
Abhishek Das
LM&Ro
122
119
0
07 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
535
6,301
0
05 Apr 2022
Object Memory Transformer for Object Goal Navigation
Object Memory Transformer for Object Goal Navigation
Rui Fukushima
Keita Ota
Asako Kanezaki
Y. Sasaki
Yusuke Yoshiyasu
64
35
0
24 Mar 2022
Uncertainty-driven Planner for Exploration and Navigation
Uncertainty-driven Planner for Exploration and Navigation
G. Georgakis
Bernadette Bucher
Anton Arapin
Karl Schmeckpeper
Nikolai Matni
Kostas Daniilidis
77
53
0
24 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
853
9,714
0
28 Jan 2022
LaMDA: Language Models for Dialog Applications
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
146
1,601
0
20 Jan 2022
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
126
386
0
22 Dec 2021
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjDVLM
136
1,067
0
07 Dec 2021
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped
  Environments with Moving Sounds
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds
Abdelrahman Younes
Daniel Honerkamp
Tim Welschehold
Abhinav Valada
84
42
0
29 Nov 2021
Simple but Effective: CLIP Embeddings for Embodied AI
Simple but Effective: CLIP Embeddings for Embodied AI
Apoorv Khandelwal
Luca Weihs
Roozbeh Mottaghi
Aniruddha Kembhavi
VLMLM&Ro
101
230
0
18 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
477
7,827
0
11 Nov 2021
123
Next