ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08168
  4. Cited By
Look, Listen and Learn

Look, Listen and Learn

23 May 2017
Relja Arandjelović
Andrew Zisserman
    SSL
ArXivPDFHTML

Papers citing "Look, Listen and Learn"

38 / 238 papers shown
Title
Self-labelling via simultaneous clustering and representation learning
Self-labelling via simultaneous clustering and representation learning
Yuki M. Asano
Christian Rupprecht
Andrea Vedaldi
SSL
42
761
0
13 Nov 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
35
88
0
24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual
  Zeroshot Classification and Retrieval of Videos
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
32
41
0
19 Oct 2019
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual
  Signals
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
14
32
0
18 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
35
91
0
30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
105
3,630
0
06 Aug 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
36
387
0
31 Jul 2019
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music
  Retrieval
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval
S. Balke
Matthias Dorfer
Luis Carvalho
A. Arzt
Gerhard Widmer
19
11
0
26 Jun 2019
Evolving Losses for Unlabeled Video Representation Learning
Evolving Losses for Unlabeled Video Representation Learning
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
11
7
0
07 Jun 2019
Learning Representations by Maximizing Mutual Information Across Views
Learning Representations by Maximizing Mutual Information Across Views
Philip Bachman
R. Devon Hjelm
William Buchwalter
SSL
72
1,457
0
03 Jun 2019
How Much Does Audio Matter to Recognize Egocentric Object Interactions?
How Much Does Audio Matter to Recognize Egocentric Object Interactions?
Alejandro Cartas
Jordi Luque
Petia Radeva
Carlos Segura
Mariella Dimiccoli
EgoV
17
6
0
03 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard?
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
28
442
0
29 May 2019
Data-Efficient Image Recognition with Contrastive Predictive Coding
Data-Efficient Image Recognition with Contrastive Predictive Coding
Olivier J. Hénaff
A. Srinivas
J. Fauw
Ali Razavi
Carl Doersch
S. M. Ali Eslami
Aaron van den Oord
SSL
58
1,417
0
22 May 2019
Machine learning in acoustics: theory and applications
Machine learning in acoustics: theory and applications
Michael J. Bianco
Peter Gerstoft
James Traer
Emma Ozanich
M. Roch
Sharon Gannot
Charles-Alban Deledalle
AI4CE
28
376
0
11 May 2019
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Priya Goyal
D. Mahajan
Abhinav Gupta
Ishan Misra
SSL
24
396
0
03 May 2019
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
The Sound of Motions
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
17
251
0
11 Apr 2019
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
27
130
0
11 Dec 2018
Decoding Brain Representations by Multimodal Learning of Neural Activity
  and Visual Features
Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features
S. Palazzo
C. Spampinato
I. Kavasidis
D. Giordano
Joseph Schmidt
M. Shah
127
111
0
25 Oct 2018
Scattering Networks for Hybrid Representation Learning
Scattering Networks for Hybrid Representation Learning
Edouard Oyallon
Sergey Zagoruyko
Gabriel Huang
N. Komodakis
Simon Lacoste-Julien
Matthew Blaschko
Eugene Belilovsky
21
84
0
17 Sep 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
30
270
0
16 Aug 2018
Talking Face Generation by Adversarially Disentangled Audio-Visual
  Representation
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Hang Zhou
Yu Liu
Ziwei Liu
Ping Luo
Xiaogang Wang
CVBM
31
436
0
20 Jul 2018
Spatio-Temporal Channel Correlation Networks for Action Classification
Spatio-Temporal Channel Correlation Networks for Action Classification
Ali Diba
Mohsen Fayyaz
Vivek Sharma
M. M. Arzani
Rahman Yousefzadeh
Juergen Gall
Luc Van Gool
3DPC
26
181
0
19 Jun 2018
Playing hard exploration games by watching YouTube
Playing hard exploration games by watching YouTube
Y. Aytar
Tobias Pfaff
David Budden
T. Paine
Ziyun Wang
Nando de Freitas
35
269
0
29 May 2018
Weakly-supervised Visual Instrument-playing Action Detection in Videos
Weakly-supervised Visual Instrument-playing Action Detection in Videos
Jen-Yu Liu
Yi-Hsuan Yang
Shyh-Kang Jeng
21
13
0
05 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
41
140
0
02 May 2018
Randomly weighted CNNs for (music) audio classification
Randomly weighted CNNs for (music) audio classification
Jordi Pons
Xavier Serra
19
85
0
01 May 2018
Adaptive pooling operators for weakly labeled sound event detection
Adaptive pooling operators for weakly labeled sound event detection
Brian McFee
Justin Salamon
J. P. Bello
27
148
0
26 Apr 2018
Weakly Supervised Representation Learning for Unsynchronized
  Audio-Visual Events
Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events
Sanjeel Parekh
S. Essid
A. Ozerov
Ngoc Q. K. Duong
P. Pérez
G. Richard
SSL
8
19
0
19 Apr 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
51
745
0
10 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
529
0
09 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Antoine Miech
Ivan Laptev
Josef Sivic
22
233
0
07 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
22
219
0
01 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
36
426
0
23 Mar 2018
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
47
538
0
09 Jan 2018
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual
  Learning
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
Andrew Owens
Jiajun Wu
Josh H. McDermott
William T. Freeman
Antonio Torralba
SSL
41
177
0
20 Dec 2017
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
44
528
0
18 Dec 2017
Previous
12345