ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1610.09001
  4. Cited By
SoundNet: Learning Sound Representations from Unlabeled Video

SoundNet: Learning Sound Representations from Unlabeled Video

27 October 2016
Y. Aytar
Carl Vondrick
Antonio Torralba
    SSL
ArXivPDFHTML

Papers citing "SoundNet: Learning Sound Representations from Unlabeled Video"

50 / 180 papers shown
Title
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
40
48
0
29 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and
  Event Captioning in Videos
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
25
101
0
28 Jul 2020
Rethinking CNN Models for Audio Classification
Rethinking CNN Models for Audio Classification
Kamalesh Palanisamy
Dipika Singhania
Angela Yao
SSL
30
144
0
22 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video
  Parsing
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
34
180
0
21 Jul 2020
Generating Visually Aligned Sound from Videos
Generating Visually Aligned Sound from Videos
Peihao Chen
Yang Zhang
Mingkui Tan
Hongdong Xiao
Deng Huang
Chuang Gan
VGen
24
95
0
14 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
28
155
0
13 Jul 2020
Predicting the Accuracy of a Few-Shot Classifier
Predicting the Accuracy of a Few-Shot Classifier
Myriam Bontonou
Louis Bethune
Vincent Gripon
8
4
0
08 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter
  Network
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
22
23
0
04 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided
  Adversarial Autoencoder
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
26
12
0
01 Jun 2020
Multimodal Target Speech Separation with Voice and Face References
Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu
C. Weber
S. Wermter
CVBM
19
19
0
17 May 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
171
84
0
04 May 2020
Cross-modal Speaker Verification and Recognition: A Multilingual
  Perspective
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
26
25
0
28 Apr 2020
Conditioned Source Separation for Music Instrument Performances
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
30
38
0
08 Apr 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
21
88
0
20 Feb 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
22
19
0
17 Jan 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
27
73
0
09 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
42
428
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
33
52
0
20 Nov 2019
Deep Long Audio Inpainting
Deep Long Audio Inpainting
Ya-Liang Chang
Kuan-Ying Lee
Po-Yu Wu
Hung-yi Lee
Winston H. Hsu
30
33
0
15 Nov 2019
DEPA: Self-Supervised Audio Embedding for Depression Detection
DEPA: Self-Supervised Audio Embedding for Depression Detection
Pingyue Zhang
Mengyue Wu
Heinrich Dinkel
Kai Yu
27
51
0
29 Oct 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
35
88
0
24 Oct 2019
Contrastive Representation Distillation
Contrastive Representation Distillation
Yonglong Tian
Dilip Krishnan
Phillip Isola
47
1,031
0
23 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual
  Zeroshot Classification and Retrieval of Videos
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
32
41
0
19 Oct 2019
Urban Sound Tagging using Convolutional Neural Networks
Urban Sound Tagging using Convolutional Neural Networks
Sainath Adapa
6
38
0
27 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
30
77
0
22 Sep 2019
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual
  Signals
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
12
32
0
18 Sep 2019
Multimodal Deep Models for Predicting Affective Responses Evoked by
  Movies
Multimodal Deep Models for Predicting Affective Responses Evoked by Movies
Ha Thi Phuong Thao
Dorien Herremans
Gemma Roig
31
16
0
16 Sep 2019
The OMG-Empathy Dataset: Evaluating the Impact of Affective Behavior in
  Storytelling
The OMG-Empathy Dataset: Evaluating the Impact of Affective Behavior in Storytelling
Pablo V. A. Barros
Nikhil Churamani
Angelica Lim
S. Wermter
28
12
0
30 Aug 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
35
91
0
30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
Multi-task Self-Supervised Learning for Human Activity Detection
Multi-task Self-Supervised Learning for Human Activity Detection
Aaqib Saeed
T. Ozcelebi
J. Lukkien
SSL
23
269
0
27 Jul 2019
Adaptive Regularization via Residual Smoothing in Deep Learning
  Optimization
Adaptive Regularization via Residual Smoothing in Deep Learning Optimization
Jung-Kyun Cho
Junseok Kwon
Byung-Woo Hong
31
1
0
23 Jul 2019
Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion
  Prediction
Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction
Mohammed Senoussaoui
P. Cardinal
Alessandro Lameiras Koerich
24
2
0
06 Jul 2019
Learning Video Representations using Contrastive Bidirectional
  Transformer
Learning Video Representations using Contrastive Bidirectional Transformer
Chen Sun
Fabien Baradel
Kevin Patrick Murphy
Cordelia Schmid
SSL
ViT
27
133
0
13 Jun 2019
Learning Individual Styles of Conversational Gesture
Learning Individual Styles of Conversational Gesture
Shiry Ginosar
Amir Bar
Gefen Kohavi
Caroline Chan
Andrew Owens
Jitendra Malik
SLR
18
326
0
10 Jun 2019
How Much Does Audio Matter to Recognize Egocentric Object Interactions?
How Much Does Audio Matter to Recognize Egocentric Object Interactions?
Alejandro Cartas
Jordi Luque
Petia Radeva
Carlos Segura
Mariella Dimiccoli
EgoV
15
6
0
03 Jun 2019
Machine learning in acoustics: theory and applications
Machine learning in acoustics: theory and applications
Michael J. Bianco
Peter Gerstoft
James Traer
Emma Ozanich
M. Roch
Sharon Gannot
Charles-Alban Deledalle
AI4CE
28
376
0
11 May 2019
End-to-End Environmental Sound Classification using a 1D Convolutional
  Neural Network
End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network
Sajjad Abdoli
P. Cardinal
Alessandro Lameiras Koerich
36
270
0
18 Apr 2019
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
The Sound of Motions
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
17
251
0
11 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
27
69
0
11 Apr 2019
Unsupervised Feature Learning for Environmental Sound Classification
  Using Weighted Cycle-Consistent Generative Adversarial Network
Unsupervised Feature Learning for Environmental Sound Classification Using Weighted Cycle-Consistent Generative Adversarial Network
Mohammad Esmaeilpour
P. Cardinal
Alessandro Lameiras Koerich
27
43
0
08 Apr 2019
DistInit: Learning Video Representations Without a Single Labeled Video
DistInit: Learning Video Representations Without a Single Labeled Video
Rohit Girdhar
Du Tran
Lorenzo Torresani
Deva Ramanan
27
54
0
26 Jan 2019
Deep Learning for Human Affect Recognition: Insights and New
  Developments
Deep Learning for Human Affect Recognition: Insights and New Developments
Philipp V. Rouast
M. Adam
R. Chiong
35
167
0
09 Jan 2019
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
21
130
0
11 Dec 2018
Previous
1234
Next