ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.10882
  4. Cited By
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

21 November 2021
Rishabh Garg
Ruohan Gao
Kristen Grauman
ArXivPDFHTML

Papers citing "Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video"

50 / 54 papers shown
Title
OmniAudio: Generating Spatial Audio from 360-Degree Video
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
97
0
0
21 Apr 2025
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
91
4
0
14 Oct 2024
Move2Hear: Active Audio-Visual Source Separation
Move2Hear: Active Audio-Visual Source Separation
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
56
44
0
15 May 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
46
57
0
13 Apr 2021
Semantic Audio-Visual Navigation
Semantic Audio-Visual Navigation
Changan Chen
Ziad Al-Halah
Kristen Grauman
92
106
0
21 Dec 2020
Learning Representations from Audio-Visual Spatial Alignment
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
71
122
0
03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of
  On-Screen Sounds
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
72
71
0
02 Nov 2020
Discriminative Sounding Objects Localization via Self-supervised
  Audiovisual Matching
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
59
135
0
12 Oct 2020
Foley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from Videos
Chuang Gan
Deng Huang
Peihao Chen
J. Tenenbaum
Antonio Torralba
VGen
46
139
0
21 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video
  Parsing
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
97
184
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
  Source Separation
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
72
82
0
20 Jul 2020
Generating Visually Aligned Sound from Videos
Generating Visually Aligned Sound from Videos
Peihao Chen
Yang Zhang
Mingkui Tan
Hongdong Xiao
Deng Huang
Chuang Gan
VGen
77
96
0
14 Jul 2020
See, Hear, Explore: Curiosity via Audio-Visual Association
See, Hear, Explore: Curiosity via Audio-Visual Association
Victoria Dean
Shubham Tulsiani
Abhinav Gupta
85
59
0
07 Jul 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
82
76
0
11 Jun 2020
FaceFilter: Audio-visual speech separation using still images
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
109
66
0
14 May 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
210
84
0
04 May 2020
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separation
Chuang Gan
Deng Huang
Hang Zhao
J. Tenenbaum
Antonio Torralba
88
204
0
20 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
71
54
0
30 Mar 2020
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Jianwei Yu
Shi-Xiong Zhang
Jian Wu
Shahram Ghorbani
Bo Wu
Shiyin Kang
Shansong Liu
Xunying Liu
Helen Meng
Dong Yu
76
73
0
06 Jan 2020
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
62
138
0
25 Dec 2019
BatVision: Learning to See 3D Spatial Layout with Two Ears
BatVision: Learning to See 3D Spatial Layout with Two Ears
J. H. Christensen
Sascha Hornauer
Stella X. Yu
40
57
0
15 Dec 2019
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
83
252
0
10 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
496
42,449
0
03 Dec 2019
Scene-Aware Audio Rendering via Deep Acoustic Analysis
Scene-Aware Audio Rendering via Deep Acoustic Analysis
Zhenyu Tang
Nicholas J. Bryan
Dingzeyu Li
Timothy R. Langlois
Tianyi Zhou
52
41
0
14 Nov 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
136
88
0
24 Oct 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
70
91
0
30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
57
337
0
22 Aug 2019
My lips are concealed: Audio-visual speech enhancement through
  obstructions
My lips are concealed: Audio-visual speech enhancement through obstructions
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
65
91
0
11 Jul 2019
The Replica Dataset: A Digital Replica of Indoor Spaces
The Replica Dataset: A Digital Replica of Indoor Spaces
Julian Straub
Thomas Whelan
Lingni Ma
Yufan Chen
Erik Wijmans
...
H. Strasdat
R. D. Nardi
Michael Goesele
S. Lovegrove
Richard Newcombe
3DV
134
853
0
13 Jun 2019
Self-supervised Audio Spatialization with Correspondence Classifier
Self-supervised Audio Spatialization with Correspondence Classifier
Yu-Ding Lu
Hsin-Ying Lee
Hung-Yu Tseng
Ming-Hsuan Yang
36
22
0
14 May 2019
Self-Supervised Audio-Visual Co-Segmentation
Self-Supervised Audio-Visual Co-Segmentation
Andrew Rouditchenko
Hang Zhao
Chuang Gan
Josh H. McDermott
Antonio Torralba
VLM
SSL
62
105
0
18 Apr 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
126
209
0
16 Apr 2019
The Sound of Motions
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
80
254
0
11 Apr 2019
Habitat: A Platform for Embodied AI Research
Habitat: A Platform for Embodied AI Research
Manolis Savva
Abhishek Kadian
Oleksandr Maksymets
Yili Zhao
Erik Wijmans
...
Jia-Wei Liu
V. Koltun
Jitendra Malik
Devi Parikh
Dhruv Batra
LM&Ro
115
1,407
0
02 Apr 2019
GANSynth: Adversarial Neural Audio Synthesis
GANSynth: Adversarial Neural Audio Synthesis
Jesse Engel
Kumar Krishna Agrawal
Shuo Chen
Ishaan Gulrajani
Chris Donahue
Adam Roberts
92
392
0
23 Feb 2019
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
106
130
0
11 Dec 2018
Self-Supervised Generation of Spatial Audio for 360 Video
Self-Supervised Generation of Spatial Audio for 360 Video
Pedro Morgado
Nuno Vasconcelos
Timothy R. Langlois
Oliver Wang
MDE
62
173
0
07 Sep 2018
Talking Face Generation by Adversarially Disentangled Audio-Visual
  Representation
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Hang Zhou
Yu Liu
Ziwei Liu
Ping Luo
Xiaogang Wang
CVBM
92
441
0
20 Jul 2018
The Conversation: Deep Audio-Visual Speech Enhancement
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
77
360
0
11 Apr 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
96
752
0
10 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
102
536
0
09 Apr 2018
Learning to Separate Object Sounds by Watching Unlabeled Video
Learning to Separate Object Sounds by Watching Unlabeled Video
Ruohan Gao
Rogerio Feris
Kristen Grauman
SSL
65
285
0
05 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
99
435
0
23 Mar 2018
Learning to Localize Sound Source in Visual Scenes
Learning to Localize Sound Source in Visual Scenes
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
66
344
0
10 Mar 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
98
530
0
18 Dec 2017
Visual to Sound: Generating Natural Sound for Videos in the Wild
Visual to Sound: Generating Natural Sound for Videos in the Wild
Yipin Zhou
Zhaowen Wang
Chen Fang
Trung Bui
Tamara L. Berg
VGen
71
208
0
04 Dec 2017
Matterport3D: Learning from RGB-D Data in Indoor Environments
Matterport3D: Learning from RGB-D Data in Indoor Environments
Angel X. Chang
Angela Dai
Thomas Funkhouser
Maciej Halber
Matthias Nießner
Manolis Savva
Shuran Song
Andy Zeng
Yinda Zhang
3DV
3DPC
186
1,901
0
18 Sep 2017
Look, Listen and Learn
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
115
905
0
23 May 2017
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
258
790
0
16 Nov 2016
SoundNet: Learning Sound Representations from Unlabeled Video
SoundNet: Learning Sound Representations from Unlabeled Video
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
115
1,044
0
27 Oct 2016
12
Next