ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.01819
  4. Cited By
Learning Representations from Audio-Visual Spatial Alignment

Learning Representations from Audio-Visual Spatial Alignment

3 November 2020
Pedro Morgado
Yi Li
Nuno Vasconcelos
    SSL
ArXivPDFHTML

Papers citing "Learning Representations from Audio-Visual Spatial Alignment"

26 / 26 papers shown
Title
OmniAudio: Generating Spatial Audio from 360-Degree Video
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
X. Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
53
0
0
21 Apr 2025
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
21
9
0
25 Oct 2023
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
31
30
0
08 Jun 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
18
17
0
22 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
34
90
0
14 May 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
18
4
0
05 Jan 2023
Self-supervised learning of audio representations using angular
  contrastive loss
Self-supervised learning of audio representations using angular contrastive loss
Shanshan Wang
S. Tripathy
A. Mesaros
SSL
18
4
0
10 Nov 2022
Learning State-Aware Visual Representations from Audible Interactions
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
70
22
0
27 Sep 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
81
64
0
30 Aug 2022
LAVA: Language Audio Vision Alignment for Contrastive Video Pre-Training
LAVA: Language Audio Vision Alignment for Contrastive Video Pre-Training
Sumanth Gurram
An Fang
David M. Chan
John F. Canny
VLM
AI4TS
30
1
0
16 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm
  Synchronization
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization
Jiashuo Yu
Junfu Pu
Ying Cheng
Rui Feng
Ying Shan
14
5
0
07 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander William Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
31
79
0
16 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
50
525
0
13 Jun 2022
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
Yang Liu
Y. Tan
Haoyu Lan
SSL
38
5
0
28 Apr 2022
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Jingfei Xia
Mingchen Zhuge
Tiantian Geng
Shun Fan
Yuantai Wei
Zhenyu He
Feng Zheng
23
14
0
08 Mar 2022
Audio-visual speech separation based on joint feature representation
  with cross-modal attention
Audio-visual speech separation based on joint feature representation with cross-modal attention
Jun Xiong
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
Yanni Zhang
20
3
0
05 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
35
106
0
02 Mar 2022
Visual Acoustic Matching
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
16
55
0
14 Feb 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks
Sound and Visual Representation Learning with Multiple Pretraining Tasks
A. Vasudevan
Dengxin Dai
Luc Van Gool
SSL
33
6
0
04 Jan 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction
  and Lip Reading
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
28
23
0
09 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
15
27
0
21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with
  Depth and Cross Modal Attention
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
31
20
0
15 Nov 2021
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video
  Representations
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Mohammadreza Zolfaghari
Yi Zhu
Peter V. Gehler
Thomas Brox
132
127
0
30 Sep 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
19
34
0
01 Apr 2021
Beyond Image to Depth: Improving Depth Prediction using Echoes
Beyond Image to Depth: Improving Depth Prediction using Echoes
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
33
37
0
15 Mar 2021
1