Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.11684
Cited By
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
25 December 2019
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Look, Listen, and Act: Towards Audio-Visual Embodied Navigation"
41 / 41 papers shown
Title
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
38
3
0
18 Jul 2024
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto
Sascha Hornauer
Fabien Moutarde
56
1
0
28 May 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya Zhang
Yanfeng Wang
39
10
0
17 Mar 2024
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
33
3
0
10 Oct 2023
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
36
18
0
19 Sep 2023
RealImpact: A Dataset of Impact Sound Fields for Real Objects
Samuel Clarke
Ruohan Gao
Mason Wang
M. Rau
Julia Xu
Jui-Hsien Wang
Doug L. James
Jiajun Wu
40
9
0
16 Jun 2023
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Sagnik Majumder
Hao Jiang
Pierre Moulon
E. Henderson
P. Calamia
Kristen Grauman
V. Ithapu
EgoV
35
7
0
04 Jan 2023
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
23
10
0
24 Dec 2022
A General Purpose Supervisory Signal for Embodied Agents
Kunal Pratap Singh
Jordi Salvador
Luca Weihs
Aniruddha Kembhavi
SSL
26
3
0
01 Dec 2022
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
Kunal Pratap Singh
Luca Weihs
Alvaro Herrasti
Jonghyun Choi
Aniruddha Kemhavi
Roozbeh Mottaghi
13
19
0
18 Nov 2022
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
43
104
0
18 Oct 2022
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
Sudipta Paul
A. Roy-Chowdhury
A. Cherian
33
23
0
14 Oct 2022
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation
Peihao Chen
Dongyu Ji
Kun-Li Channing Lin
Runhao Zeng
Thomas H. Li
Mingkui Tan
Chuang Gan
SSL
36
62
0
14 Oct 2022
Learning Active Camera for Multi-Object Navigation
Peihao Chen
Dongyu Ji
Kun-Li Channing Lin
Weiwen Hu
Wenbing Huang
Thomas H. Li
Ming Tan
Chuang Gan
33
24
0
14 Oct 2022
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
Tanvir Mahmud
Diana Marculescu
CLIP
16
31
0
11 Oct 2022
Anticipating the Unseen Discrepancy for Vision and Language Navigation
Yujie Lu
Huiliang Zhang
Ping Nie
Weixi Feng
Wenda Xu
Qing Guo
William Yang Wang
35
1
0
10 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
18
8
0
04 Aug 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
52
19
0
07 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
36
80
0
16 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
44
237
0
14 Jun 2022
Learning Neural Acoustic Fields
Andrew F. Luo
Yilun Du
Michael J. Tarr
J. Tenenbaum
Antonio Torralba
Chuang Gan
AI4CE
20
77
0
04 Apr 2022
Sound Adversarial Audio-Visual Navigation
Yinfeng Yu
Wenbing Huang
Gang Hua
Changan Chen
Yikai Wang
Xiaohong Liu
AAML
24
29
0
22 Feb 2022
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
21
56
0
14 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources
Sagnik Majumder
Kristen Grauman
27
21
0
02 Feb 2022
Symmetry-aware Neural Architecture for Embodied Visual Navigation
Shuang Liu
Takayuki Okatani
34
1
0
17 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
15
28
0
21 Nov 2021
Audio-Visual Grounding Referring Expression for Robotic Manipulation
Yefei Wang
Kaili Wang
Yi Wang
Di Guo
Huaping Liu
F. Sun
38
12
0
22 Sep 2021
Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene
Qi Wu
Cheng-Ju Wu
Yixin Zhu
Jungseock Joo
43
14
0
05 Aug 2021
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
Prithvijit Chattopadhyay
Judy Hoffman
Roozbeh Mottaghi
Aniruddha Kembhavi
25
55
0
08 Jun 2021
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
31
37
0
05 Apr 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
47
73
0
01 Jan 2021
Semantic Audio-Visual Navigation
Changan Chen
Ziad Al-Halah
Kristen Grauman
50
104
0
21 Dec 2020
Occupancy Anticipation for Efficient Exploration and Navigation
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
EgoV
3DPC
22
162
0
21 Aug 2020
Generating Visually Aligned Sound from Videos
Peihao Chen
Yang Zhang
Mingkui Tan
Hongdong Xiao
Deng Huang
Chuang Gan
VGen
16
95
0
14 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
28
155
0
13 Jul 2020
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
Chuang Gan
Jeremy Schwartz
S. Alter
Damian Mrowca
Martin Schrimpf
...
Antonio Torralba
J. DiCarlo
J. Tenenbaum
Josh H. McDermott
Daniel L. K. Yamins
VGen
53
305
0
09 Jul 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
22
23
0
04 Jun 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
171
84
0
04 May 2020
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&Ro
LRM
260
498
0
07 Jun 2018
CAD2RL: Real Single-Image Flight without a Single Real Image
Fereshteh Sadeghi
Sergey Levine
SSL
243
809
0
13 Nov 2016
1