Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.12667
Cited By
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
28 November 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Supervised Learning by Cross-Modal Audio-Video Clustering"
50 / 111 papers shown
Title
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
137
1,129
0
23 Mar 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
35
106
0
02 Mar 2022
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
41
10
0
15 Feb 2022
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
21
56
0
14 Feb 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
18
25
0
13 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
32
7
0
02 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
M. Pantic
CVBM
40
126
0
18 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
29
108
0
13 Jan 2022
Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
R. Devon Hjelm
Xin Wang
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Ya-heng Song
NoLa
13
68
0
12 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
46
305
0
05 Jan 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
24
41
0
22 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
61
6
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
34
128
0
08 Dec 2021
Time-Equivariant Contrastive Video Representation Learning
Simon Jenni
Hailin Jin
SSL
AI4TS
143
60
0
07 Dec 2021
TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning
Yang Liu
Keze Wang
Lingbo Liu
Hao Lan
Liang Lin
SSL
AI4TS
53
113
0
07 Dec 2021
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Michael S. Ryoo
ViT
39
84
0
02 Dec 2021
Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation
Dipika Singhania
R. Rahaman
Angela Yao
27
23
0
02 Dec 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
12
51
0
25 Nov 2021
Latent Structure Mining with Contrastive Modality Fusion for Multimedia Recommendation
Jinghao Zhang
Yanqiao Zhu
Qiang Liu
Mengqi Zhang
Shu Wu
Liang Wang
22
34
0
01 Nov 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
33
268
0
21 Oct 2021
Constrained Mean Shift for Representation Learning
Ajinkya Tejankar
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
SSL
45
0
0
19 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and Challenges
Linus Ericsson
Henry Gouk
Chen Change Loy
Timothy M. Hospedales
SSL
OOD
AI4TS
34
273
0
18 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
34
0
0
13 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Chongjian Ge
Youwei Liang
Yibing Song
Jianbo Jiao
Jue Wang
Ping Luo
ViT
24
36
0
11 Oct 2021
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding
Maomao Li
Tianyu Yang
Rui Qian
Haohang Xu
Qingyi Chen
Jue Wang
Hongkai Xiong
SSL
28
49
0
30 Sep 2021
Multi-level Feature Learning for Contrastive Multi-view Clustering
Jie Xu
Huayi Tang
Yazhou Ren
Liang Peng
Xiao-lan Zhu
Lifang He
32
161
0
21 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
40
11
0
18 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
M. Pantic
SSL
24
53
0
16 Jun 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
40
9
0
05 Jun 2021
WiCluster: Passive Indoor 2D/3D Positioning using WiFi without Precise Labels
I. Karmanov
F. G. Zanjani
S. Merlin
I. Kadampot
Daniel Dijkman
19
14
0
31 May 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
64
96
0
17 May 2021
CoCon: Cooperative-Contrastive Learning
Nishant Rai
Ehsan Adeli
Kuan-Hui Lee
Adrien Gaidon
Juan Carlos Niebles
SSL
20
18
0
30 Apr 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
39
257
0
29 Apr 2021
Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data
Xu Jia
Kai Han
Yukun Zhu
Bradley Green
152
57
0
26 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
55
0
13 Apr 2021
Cross-Modal learning for Audio-Visual Video Parsing
Jatin Lamba
Abhishek
Jayaprakash Akula
Rishabh Dabral
P. Jyothi
Ganesh Ramakrishnan
13
7
0
03 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Bo Xiong
Haoqi Fan
Kristen Grauman
Christoph Feichtenhofer
SSL
22
49
0
01 Apr 2021
Composable Augmentation Encoding for Video Representation Learning
Chen Sun
Arsha Nagrani
Yonglong Tian
Cordelia Schmid
SSL
AI4TS
37
17
0
01 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
27
34
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
33
127
0
30 Mar 2021
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization
Mengmeng Xu
Juan-Manuel Perez-Rua
Xiatian Zhu
Guohao Li
Brais Martinez
15
27
0
28 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
29
33
0
18 Mar 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Kunpeng Li
Zizhao Zhang
Guanhang Wu
Xuehan Xiong
Chen-Yu Lee
Zhichao Lu
Y. Fu
Tomas Pfister
29
5
0
11 Jan 2021
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
38
185
0
11 Dec 2020
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
33
123
0
23 Nov 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning
Zehua Zhang
David J. Crandall
AI4TS
SSL
28
23
0
23 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
27
121
0
03 Nov 2020
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags
Xavier Favory
K. Drossos
Tuomas Virtanen
Xavier Serra
32
15
0
27 Oct 2020
Hard Negative Mixing for Contrastive Learning
Yannis Kalantidis
Mert Bulent Sariyildiz
Noé Pion
Philippe Weinzaepfel
Diane Larlus
SSL
53
628
0
02 Oct 2020
Understanding Self-supervised Learning with Dual Deep Networks
Yuandong Tian
Lantao Yu
Xinlei Chen
Surya Ganguli
SSL
13
78
0
01 Oct 2020
Previous
1
2
3
Next