ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08168
  4. Cited By
Look, Listen and Learn

Look, Listen and Learn

23 May 2017
Relja Arandjelović
Andrew Zisserman
    SSL
ArXivPDFHTML

Papers citing "Look, Listen and Learn"

50 / 238 papers shown
Title
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual
  Video Representation Learning
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
71
45
0
26 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,434
0
04 Jan 2021
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency
  Prediction
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
ViT
29
66
0
11 Dec 2020
Game Plan: What AI can do for Football, and What Football can do for AI
Game Plan: What AI can do for Football, and What Football can do for AI
K. Tuyls
Shayegan Omidshafiei
Paul Muller
Zhe Wang
Jerome T. Connor
...
Simon Bouton
Nathalie Beauguerlange
Jackson Broshear
T. Graepel
Demis Hassabis
44
78
0
18 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
27
121
0
03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of
  On-Screen Sounds
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
39
69
0
02 Nov 2020
Multimodal and self-supervised representation learning for automatic
  gesture recognition in surgical robotics
Multimodal and self-supervised representation learning for automatic gesture recognition in surgical robotics
Aniruddha Tamhane
J. Wu
Mathias Unberath
SSL
6
0
0
31 Oct 2020
Contrastive Representation Learning: A Framework and Review
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSL
AI4TS
184
687
0
10 Oct 2020
Emotion-Based End-to-End Matching Between Image and Music in
  Valence-Arousal Space
Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space
Sicheng Zhao
Yaxian Li
Xingxu Yao
Weizhi Nie
Pengfei Xu
Jufeng Yang
Kurt Keutzer
19
29
0
22 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised
  Audio-Visual Representation Learning
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
36
106
0
13 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
253
0
10 Aug 2020
Assisting Scene Graph Generation with Self-Supervision
Assisting Scene Graph Generation with Self-Supervision
Sandeep Inuganti
V. Balasubramanian
SSL
16
7
0
08 Aug 2020
Data Cleansing with Contrastive Learning for Vocal Note Event
  Annotations
Data Cleansing with Contrastive Learning for Vocal Note Event Annotations
Gabriel Meseguer-Brocal
Rachel M. Bittner
Simon Durand
B. Brost
34
6
0
05 Aug 2020
Self-supervised Learning of Point Clouds via Orientation Estimation
Self-supervised Learning of Point Clouds via Orientation Estimation
Omid Poursaeed
Tianxing Jiang
Quintessa Qiao
N. Xu
Vladimir G. Kim
3DPC
SSL
11
116
0
01 Aug 2020
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
40
48
0
29 Jul 2020
Augmentation adversarial training for self-supervised speaker
  recognition
Augmentation adversarial training for self-supervised speaker recognition
Jaesung Huh
Hee-Soo Heo
Jingu Kang
Shinji Watanabe
Joon Son Chung
SSL
48
76
0
23 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video
  Parsing
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
34
180
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
  Source Separation
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
32
81
0
20 Jul 2020
MINI-Net: Multiple Instance Ranking Network for Video Highlight
  Detection
MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection
Fa-Ting Hong
Xuanteng Huang
Weihong Li
Weishi Zheng
10
61
0
20 Jul 2020
Generating Visually Aligned Sound from Videos
Generating Visually Aligned Sound from Videos
Peihao Chen
Yang Zhang
Mingkui Tan
Hongdong Xiao
Deng Huang
Chuang Gan
VGen
24
95
0
14 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
28
155
0
13 Jul 2020
Look and Listen: A Multi-modality Late Fusion Approach to Scene
  Classification for Autonomous Machines
Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines
Jordan J. Bird
Diego Resende Faria
C. Premebida
Anikó Ekárt
George Vogiatzis
18
13
0
11 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
372
0
29 Jun 2020
Video Playback Rate Perception for Self-supervisedSpatio-Temporal
  Representation Learning
Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning
Yuan Yao
Chang-rui Liu
Dezhao Luo
Yu Zhou
QiXiang Ye
29
169
0
20 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio
  Representations
COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations
Xavier Favory
K. Drossos
Tuomas Virtanen
Xavier Serra
16
32
0
15 Jun 2020
Towards Robust Pattern Recognition: A Review
Towards Robust Pattern Recognition: A Review
Xu-Yao Zhang
Cheng-Lin Liu
C. Suen
OOD
HAI
19
103
0
12 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
24
75
0
11 Jun 2020
Self-supervised Learning from a Multi-view Perspective
Self-supervised Learning from a Multi-view Perspective
Yao-Hung Hubert Tsai
Yue Wu
Ruslan Salakhutdinov
Louis-Philippe Morency
SSL
25
30
0
10 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter
  Network
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
22
23
0
04 Jun 2020
Deep Learning for Insider Threat Detection: Review, Challenges and
  Opportunities
Deep Learning for Insider Threat Detection: Review, Challenges and Opportunities
Shuhan Yuan
Xintao Wu
AAML
20
157
0
25 May 2020
S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement
  and Data Generation
S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation
Yizhe Zhu
Martin Renqiang Min
Asim Kadav
H. Graf
CoGe
DRL
32
95
0
23 May 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
171
84
0
04 May 2020
Conditioned Source Separation for Music Instrument Performances
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
30
38
0
08 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
33
54
0
30 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal
  Learning
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
17
121
0
06 Mar 2020
Learning Representations by Predicting Bags of Visual Words
Learning Representations by Predicting Bags of Visual Words
Spyros Gidaris
Andrei Bursuc
N. Komodakis
P. Pérez
Matthieu Cord
SSL
28
117
0
27 Feb 2020
Evolving Losses for Unsupervised Video Representation Learning
Evolving Losses for Unsupervised Video Representation Learning
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
27
138
0
26 Feb 2020
An Open-set Recognition and Few-Shot Learning Dataset for Audio Event
  Classification in Domestic Environments
An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
Javier Naranjo-Alcazar
Sergi Perez-Castanos
P. Zuccarello
Ana M. Torres
Jose J. Lopez
Franscesc J. Ferri
M. Cobos
23
15
0
26 Feb 2020
Towards Learning a Universal Non-Semantic Representation of Speech
Towards Learning a Universal Non-Semantic Representation of Speech
Joel Shor
A. Jansen
Ronnie Maor
Oran Lang
Omry Tuval
Félix de Chaumont Quitry
Marco Tagliasacchi
Ira Shavitt
Dotan Emanuel
Yinnon A. Haviv
SSL
44
155
0
25 Feb 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent
  Videos with Deep Learning
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning
Sanchita Ghose
John J. Prevost
VGen
14
46
0
21 Feb 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
21
88
0
20 Feb 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
27
73
0
09 Jan 2020
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Dezhao Luo
Chang-rui Liu
Yu Zhou
Dongbao Yang
Can Ma
QiXiang Ye
Weiping Wang
SSL
25
160
0
02 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
42
428
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
33
52
0
20 Nov 2019
Previous
12345
Next