Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.00230
Cited By
v1
v2 (latest)
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
30 June 2018
Bruno Korbar
Du Tran
Lorenzo Torresani
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"
50 / 316 papers shown
Title
Active Contrastive Learning of Audio-Visual Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
VLM
SSL
62
8
0
31 Aug 2020
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics
Jiangliu Wang
Jianbo Jiao
Linchao Bao
Shengfeng He
Wei Liu
Yunhui Liu
SSL
AI4TS
75
55
0
31 Aug 2020
Self-supervised Video Representation Learning by Pace Prediction
Jiangliu Wang
Jianbo Jiao
Yunhui Liu
SSL
AI4TS
103
237
0
13 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
150
110
0
13 Aug 2020
Spatiotemporal Contrastive Video Representation Learning
Rui Qian
Tianjian Meng
Boqing Gong
Ming-Hsuan Yang
Haoran Wang
Serge J. Belongie
Huayu Chen
SSL
AI4TS
180
502
0
09 Aug 2020
Memory-augmented Dense Predictive Coding for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
149
242
0
03 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
115
48
0
29 Jul 2020
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
Chuang Gan
Xiaoyu Chen
Phillip Isola
Antonio Torralba
J. Tenenbaum
70
7
0
27 Jul 2020
Video Representation Learning by Recognizing Temporal Transformations
Simon Jenni
Givi Meishvili
Paolo Favaro
203
135
0
21 Jul 2020
CSLNSpeech: solving extended speech separation problem with the help of Chinese sign language
Jiasong Wu
Xuan Li
Taotao Li
Fanman Meng
Youyong Kong
Guanyu Yang
L. Senhadji
Huazhong Shu
CVBM
97
0
0
21 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
151
185
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
103
84
0
20 Jul 2020
Leveraging Category Information for Single-Frame Visual Sound Source Separation
Lingyu Zhu
Esa Rahtu
89
9
0
15 Jul 2020
Do We Need Sound for Sound Source Localization?
Takashi Oya
Shohei Iwase
Ryota Natsume
Takahiro Itazuri
Shugo Yamaguchi
Shigeo Morishima
55
22
0
11 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
219
375
0
29 Jun 2020
Video Representation Learning with Visual Tempo Consistency
Ceyuan Yang
Yinghao Xu
Bo Dai
Bolei Zhou
75
92
0
28 Jun 2020
Space-Time Correspondence as a Contrastive Random Walk
Allan Jabri
Andrew Owens
Alexei A. Efros
SSL
OT
170
304
0
25 Jun 2020
Labelling unlabelled videos from scratch with multi-modal self-supervision
Yuki M. Asano
Mandela Patrick
Christian Rupprecht
Andrea Vedaldi
SSL
151
152
0
24 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
107
142
0
16 Jun 2020
Solos: A Dataset for Audio-Visual Music Analysis
Juan F. Montesinos
Olga Slizovskaia
G. Haro
67
11
0
14 Jun 2020
DTG-Net: Differentiated Teachers Guided Self-Supervised Video Action Recognition
Ziming Liu
Guangyu Gao
•. A. K. Qin
Jinyang Li
ViT
62
1
0
13 Jun 2020
Video Understanding as Machine Translation
Bruno Korbar
Fabio Petroni
Rohit Girdhar
Lorenzo Torresani
SSL
99
29
0
12 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
118
76
0
11 Jun 2020
Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data
Haytham M. Fayek
Anurag Kumar
103
36
0
29 May 2020
Self-supervised Modal and View Invariant Feature Learning
Longlong Jing
Yucheng Chen
Ling Zhang
Mingyi He
Yingli Tian
3DPC
SSL
63
29
0
28 May 2020
End-to-End Lip Synchronisation Based on Pattern Classification
You Jin Kim
Hee-Soo Heo
Soo-Whan Chung
Bong-Jin Lee
CVBM
47
0
0
18 May 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
Abhinav Shukla
Stavros Petridis
Maja Pantic
SSL
109
28
0
04 May 2020
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
Soo-Whan Chung
Hong-Goo Kang
Joon Son Chung
SSL
57
39
0
29 Apr 2020
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Pedro Morgado
Nuno Vasconcelos
Ishan Misra
SSL
98
276
0
27 Apr 2020
Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences
Longlong Jing
Yucheng Chen
Ling Zhang
Mingyi He
Yingli Tian
3DPC
SSL
78
34
0
13 Apr 2020
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
74
40
0
08 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
93
54
0
30 Mar 2020
Watching the World Go By: Representation Learning from Unlabeled Videos
Daniel Gordon
Kiana Ehsani
Dieter Fox
Ali Farhadi
SSL
AI4TS
102
90
0
18 Mar 2020
On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
109
50
0
09 Mar 2020
Cross-modal Learning for Multi-modal Video Categorization
Palash Goyal
Saurabh Sahu
Shalini Ghosh
Chul Lee
79
9
0
07 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
152
126
0
06 Mar 2020
VideoSSL: Semi-Supervised Learning for Video Classification
Longlong Jing
T. Parag
Zhe Wu
Yingli Tian
Hongcheng Wang
71
52
0
29 Feb 2020
Evolving Losses for Unsupervised Video Representation Learning
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
89
140
0
26 Feb 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
99
88
0
20 Feb 2020
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
Jonathan Munro
Dima Damen
EgoV
99
196
0
27 Jan 2020
Curriculum Audiovisual Learning
Di Hu
Zechuan Wang
Haoyi Xiong
Dong Wang
Feiping Nie
Dejing Dou
SSL
74
32
0
26 Jan 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
310
209
0
23 Jan 2020
Learning Spatiotemporal Features via Video and Text Pair Discrimination
Tianhao Li
Limin Wang
VGen
81
57
0
16 Jan 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
90
162
0
14 Jan 2020
Visually Guided Self Supervised Learning of Speech Representations
Abhinav Shukla
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Maja Pantic
SSL
87
25
0
13 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
108
73
0
09 Jan 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
164
713
0
13 Dec 2019
Self-Supervised Learning of Pretext-Invariant Representations
Ishan Misra
Laurens van der Maaten
SSL
VLM
149
1,461
0
04 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
219
433
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
86
55
0
20 Nov 2019
Previous
1
2
3
4
5
6
7
Next