ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.03641
  4. Cited By
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018
Andrew Owens
Alexei A. Efros
    SSL
ArXivPDFHTML

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

37 / 187 papers shown
Title
Conditioned Source Separation for Music Instrument Performances
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
30
38
0
08 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
25
54
0
30 Mar 2020
A Metric Learning Reality Check
A Metric Learning Reality Check
Kevin Musgrave
Serge J. Belongie
Ser-Nam Lim
38
475
0
18 Mar 2020
Watching the World Go By: Representation Learning from Unlabeled Videos
Watching the World Go By: Representation Learning from Unlabeled Videos
Daniel Gordon
Kiana Ehsani
D. Fox
Ali Farhadi
SSL
AI4TS
29
87
0
18 Mar 2020
Evolving Losses for Unsupervised Video Representation Learning
Evolving Losses for Unsupervised Video Representation Learning
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
21
138
0
26 Feb 2020
Self-Supervised Joint Encoding of Motion and Appearance for First Person
  Action Recognition
Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition
M. Planamente
A. Bottino
Barbara Caputo
EgoV
15
3
0
10 Feb 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
206
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
24
73
0
09 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
33
428
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
33
52
0
20 Nov 2019
MMTM: Multimodal Transfer Module for CNN Fusion
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
18
277
0
20 Nov 2019
DEPA: Self-Supervised Audio Embedding for Depression Detection
DEPA: Self-Supervised Audio Embedding for Depression Detection
Pingyue Zhang
Mengyue Wu
Heinrich Dinkel
Kai Yu
27
51
0
29 Oct 2019
PRNet: Self-Supervised Learning for Partial-to-Partial Registration
PRNet: Self-Supervised Learning for Partial-to-Partial Registration
Yue Wang
Justin Solomon
SSL
3DPC
16
379
0
27 Oct 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
27
88
0
24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual
  Zeroshot Classification and Retrieval of Videos
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
27
41
0
19 Oct 2019
Learning to Have an Ear for Face Super-Resolution
Learning to Have an Ear for Face Super-Resolution
Givi Meishvili
Simon Jenni
Paolo Favaro
SupR
CVBM
33
23
0
27 Sep 2019
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech
  Enhancement
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement
M. Gogate
K. Dashtipour
Ahsan Adeel
Amir Hussain
15
53
0
23 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
30
77
0
22 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
32
91
0
30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
Making Sense of Vision and Touch: Learning Multimodal Representations
  for Contact-Rich Tasks
Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks
Michelle A. Lee
Yuke Zhu
Peter Zachares
Matthew Tan
K. Srinivasan
Silvio Savarese
Fei-Fei Li
Animesh Garg
Jeannette Bohg
SSL
23
207
0
28 Jul 2019
Multi-task Self-Supervised Learning for Human Activity Detection
Multi-task Self-Supervised Learning for Human Activity Detection
Aaqib Saeed
T. Ozcelebi
J. Lukkien
SSL
21
268
0
27 Jul 2019
Adaptive Regularization via Residual Smoothing in Deep Learning
  Optimization
Adaptive Regularization via Residual Smoothing in Deep Learning Optimization
Jung-Kyun Cho
Junseok Kwon
Byung-Woo Hong
28
1
0
23 Jul 2019
Machine learning in acoustics: theory and applications
Machine learning in acoustics: theory and applications
Michael J. Bianco
Peter Gerstoft
James Traer
Emma Ozanich
M. Roch
Sharon Gannot
Charles-Alban Deledalle
AI4CE
25
375
0
11 May 2019
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
27
205
0
16 Apr 2019
The Sound of Motions
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
17
250
0
11 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
21
69
0
11 Apr 2019
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Self-Supervised Generation of Spatial Audio for 360 Video
Self-Supervised Generation of Spatial Audio for 360 Video
Pedro Morgado
Nuno Vasconcelos
Timothy R. Langlois
Oliver Wang
MDE
16
171
0
07 Sep 2018
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
236
2,233
0
14 Jun 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
527
0
09 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
31
425
0
23 Mar 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
38
528
0
18 Dec 2017
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
164
784
0
16 Nov 2016
Previous
1234