Look, Listen and Learn

23 May 2017

Papers citing "Look, Listen and Learn"

38 / 238 papers shown

Title
Self-labelling via simultaneous clustering and representation learning Yuki M. Asano Christian Rupprecht Andrea Vedaldi SSL 42 761 0 13 Nov 2019
Vision-Infused Deep Audio Inpainting Hang Zhou Ziwei Liu Lingfeng Guo Ping Luo Dahua Lin 35 88 0 24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos Kranti K. Parida Neeraj Matiyali T. Guha Gaurav Sharma VLM 32 41 0 19 Oct 2019
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals Shah Nawaz Muhammad Kamran Janjua I. Gallo Arif Mahmood Alessandro Calefati 14 32 0 18 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net Xudong Xu Bo Dai Dahua Lin 35 91 0 30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition Evangelos Kazakos Arsha Nagrani Andrew Zisserman Dima Damen EgoV 16 332 0 22 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu Dhruv Batra Devi Parikh Stefan Lee SSL VLM 105 3,630 0 06 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts Yang Liu Samuel Albanie Arsha Nagrani Andrew Zisserman 36 387 0 31 Jul 2019
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval S. Balke Matthias Dorfer Luis Carvalho A. Arzt Gerhard Widmer 19 11 0 26 Jun 2019
Evolving Losses for Unlabeled Video Representation Learning A. Piergiovanni A. Angelova Michael S. Ryoo SSL 11 7 0 07 Jun 2019
Learning Representations by Maximizing Mutual Information Across Views Philip Bachman R. Devon Hjelm William Buchwalter SSL 72 1,457 0 03 Jun 2019
How Much Does Audio Matter to Recognize Egocentric Object Interactions? Alejandro Cartas Jordi Luque Petia Radeva Carlos Segura Mariella Dimiccoli EgoV 17 6 0 03 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard? Weiyao Wang Du Tran Matt Feiszli 28 442 0 29 May 2019
Data-Efficient Image Recognition with Contrastive Predictive Coding Olivier J. Hénaff A. Srinivas J. Fauw Ali Razavi Carl Doersch S. M. Ali Eslami Aaron van den Oord SSL 58 1,417 0 22 May 2019
Machine learning in acoustics: theory and applications Michael J. Bianco Peter Gerstoft James Traer Emma Ozanich M. Roch Sharon Gannot Charles-Alban Deledalle AI4CE 28 376 0 11 May 2019
Scaling and Benchmarking Self-Supervised Visual Representation Learning Priya Goyal D. Mahajan Abhinav Gupta Ishan Misra SSL 24 396 0 03 May 2019
Audio-Visual Model Distillation Using Acoustic Images Andrés F. Pérez Valentina Sanguineti Pietro Morerio Vittorio Murino VLM 15 27 0 16 Apr 2019
The Sound of Motions Hang Zhao Chuang Gan Wei-Chiu Ma Antonio Torralba 17 251 0 11 Apr 2019
2.5D Visual Sound Ruohan Gao Kristen Grauman VGen 27 130 0 11 Dec 2018
Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features S. Palazzo C. Spampinato I. Kavasidis D. Giordano Joseph Schmidt M. Shah 127 111 0 25 Oct 2018
Scattering Networks for Hybrid Representation Learning Edouard Oyallon Sergey Zagoruyko Gabriel Huang N. Komodakis Simon Lacoste-Julien Matthew Blaschko Eugene Belilovsky 21 84 0 17 Sep 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild Samuel Albanie Arsha Nagrani Andrea Vedaldi Andrew Zisserman CVBM 30 270 0 16 Aug 2018
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation Hang Zhou Yu Liu Ziwei Liu Ping Luo Xiaogang Wang CVBM 31 436 0 20 Jul 2018
Spatio-Temporal Channel Correlation Networks for Action Classification Ali Diba Mohsen Fayyaz Vivek Sharma M. M. Arzani Rahman Yousefzadeh Juergen Gall Luc Van Gool 3DPC 26 181 0 19 Jun 2018
Playing hard exploration games by watching YouTube Y. Aytar Tobias Pfaff David Budden T. Paine Ziyun Wang Nando de Freitas 35 269 0 29 May 2018
Weakly-supervised Visual Instrument-playing Action Detection in Videos Jen-Yu Liu Yi-Hsuan Yang Shyh-Kang Jeng 21 13 0 05 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani Samuel Albanie Andrew Zisserman SSL 41 140 0 02 May 2018
Randomly weighted CNNs for (music) audio classification Jordi Pons Xavier Serra 19 85 0 01 May 2018
Adaptive pooling operators for weakly labeled sound event detection Brian McFee Justin Salamon J. P. Bello 27 148 0 26 Apr 2018
Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events Sanjeel Parekh S. Essid A. Ozerov Ngoc Q. K. Duong P. Pérez G. Richard SSL 8 19 0 19 Apr 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features Andrew Owens Alexei A. Efros SSL 51 745 0 10 Apr 2018
The Sound of Pixels Hang Zhao Chuang Gan Andrew Rouditchenko Carl Vondrick Josh H. McDermott Antonio Torralba VLM 22 529 0 09 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data Antoine Miech Ivan Laptev Josef Sivic 22 233 0 07 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching Arsha Nagrani Samuel Albanie Andrew Zisserman CVBM 22 219 0 01 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos Yapeng Tian Jing Shi Bochen Li Zhiyao Duan Chenliang Xu 36 426 0 23 Mar 2018
Moments in Time Dataset: one million videos for event understanding Mathew Monfort A. Andonian Bolei Zhou K. Ramakrishnan Sarah Adel Bargal ... L. Brown Quanfu Fan Dan Gutfreund Carl Vondrick A. Oliva 47 538 0 09 Jan 2018
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning Andrew Owens Jiajun Wu Josh H. McDermott William T. Freeman Antonio Torralba SSL 41 177 0 20 Dec 2017
Objects that Sound Relja Arandjelović Andrew Zisserman ObjD VOS 44 528 0 18 Dec 2017