ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.00230
  4. Cited By
Cooperative Learning of Audio and Video Models from Self-Supervised
  Synchronization
v1v2 (latest)

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

30 June 2018
Bruno Korbar
Du Tran
Lorenzo Torresani
ArXiv (abs)PDFHTML

Papers citing "Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"

50 / 316 papers shown
Title
Visual Sound Localization in the Wild by Cross-Modal Interference
  Erasing
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
80
26
0
13 Feb 2022
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Arda Senocak
Junsik Kim
Tae-Hyun Oh
H. Ryu
Dingzeyu Li
In So Kweon
87
1
0
12 Feb 2022
Real-time Emergency Vehicle Event Detection Using Audio Data
Real-time Emergency Vehicle Event Detection Using Audio Data
Zubayer Islam
Mohamed Abdel-Aty
28
6
0
03 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
  Detection
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
Maja Pantic
CVBM
121
133
0
18 Jan 2022
SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical
  Segmentation on Less Labeled Data
SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical Segmentation on Less Labeled Data
Minh-Khoi Tran
Loi Ly
Binh-Son Hua
Ngan Le
3DPCMedIm
81
17
0
15 Jan 2022
Robust Contrastive Learning against Noisy Views
Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
R. Devon Hjelm
Xin Eric Wang
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Ya-heng Song
NoLa
66
72
0
12 Jan 2022
Progressive Video Summarization via Multimodal Self-supervised Learning
Progressive Video Summarization via Multimodal Self-supervised Learning
Haopeng Li
Qiuhong Ke
Mingming Gong
Tom Drummond
AI4TS
80
19
0
07 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster
  Prediction
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
143
321
0
05 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks
Sound and Visual Representation Learning with Multiple Pretraining Tasks
A. Vasudevan
Dengxin Dai
Luc Van Gool
SSL
90
6
0
04 Jan 2022
Decompose the Sounds and Pixels, Recompose the Events
Decompose the Sounds and Pixels, Recompose the Events
Varshanth R. Rao
Md Ibrahim Khalil
Haoda Li
Peng Dai
Juwei Lu
58
5
0
21 Dec 2021
Connecting the Dots between Audio and Text without Parallel Data through
  Visual Knowledge Transfer
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
Yanpeng Zhao
Jack Hessel
Youngjae Yu
Ximing Lu
Rowan Zellers
Yejin Choi
129
27
0
16 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction
  and Lip Reading
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
79
23
0
09 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
106
6
0
08 Dec 2021
Audio-Visual Synchronisation in the wild
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
127
40
0
08 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised
  Video Representation Learning
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Manlin Zhang
Jinpeng Wang
A. J. Ma
85
9
0
07 Dec 2021
Time-Equivariant Contrastive Video Representation Learning
Time-Equivariant Contrastive Video Representation Learning
Simon Jenni
Hailin Jin
SSLAI4TS
212
61
0
07 Dec 2021
Boosting Discriminative Visual Representation Learning with
  Scenario-Agnostic Mixup
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup
Siyuan Li
Zicheng Liu
Zedong Wang
Di Wu
Zihan Liu
Stan Z. Li
111
27
0
30 Nov 2021
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Eric Z. Xu
Zeyang Song
Satoshi Tsutsui
C. Feng
Mang Ye
Mike Zheng Shou
VGen
85
43
0
29 Nov 2021
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
  3D Scenes
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes
Suhani Vora
Noha Radwan
Klaus Greff
H. Meyer
Kyle Genova
Mehdi S. M. Sajjadi
Etienne Pot
Andrea Tagliasacchi
Daniel Duckworth
166
127
0
25 Nov 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual
  Event Localization and Video Parsing
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
110
62
0
24 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with
  Depth and Cross Modal Attention
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
82
21
0
15 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Structure from Silence: Learning Scene Structure from Ambient Sound
Ziyang Chen
Xixi Hu
Andrew Owens
121
26
0
10 Nov 2021
Self-Supervised Audio-Visual Representation Learning with Relaxed
  Cross-Modal Synchronicity
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
Pritam Sarkar
Ali Etemad
SSL
100
11
0
09 Nov 2021
Contrast and Mix: Temporal Contrastive Video Domain Adaptation with
  Background Mixing
Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing
Aadarsh Sahoo
Rutav Shah
Yikang Shen
Kate Saenko
Abir Das
84
65
0
28 Oct 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning
  for Visual Sound Separation
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
80
8
0
26 Oct 2021
Learning 3D Semantic Segmentation with only 2D Image Supervision
Learning 3D Semantic Segmentation with only 2D Image Supervision
Kyle Genova
Xiaoqi Yin
Abhijit Kundu
C. Pantofaru
Forrester Cole
Avneesh Sud
B. Brewington
B. Shucker
Thomas Funkhouser
3DPC
69
81
0
21 Oct 2021
Constrained Mean Shift for Representation Learning
Constrained Mean Shift for Representation Learning
Ajinkya Tejankar
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
SSL
58
0
0
19 Oct 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in
  First Person Action Recognition
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
126
35
0
19 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised
  Audiovisual Representation Learning
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSLAI4TS
68
0
0
13 Oct 2021
Modelling Neighbor Relation in Joint Space-Time Graph for Video
  Correspondence Learning
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
Zixu Zhao
Yueming Jin
Pheng-Ann Heng
SSL
90
21
0
28 Sep 2021
V-SlowFast Network for Efficient Visual Sound Separation
V-SlowFast Network for Efficient Visual Sound Separation
Lingyu Zhu
Esa Rahtu
116
10
0
18 Sep 2021
Learning Cross-modal Contrastive Features for Video Domain Adaptation
Learning Cross-modal Contrastive Features for Video Domain Adaptation
Donghyun Kim
Yi-Hsuan Tsai
Bingbing Zhuang
Xiang Yu
Stan Sclaroff
Kate Saenko
Manmohan Chandraker
92
73
0
26 Aug 2021
Temporal Knowledge Consistency for Unsupervised Visual Representation
  Learning
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
Wei Feng
Yuanjiang Wang
Lihua Ma
Ye Yuan
Fangqiu Yi
SSL
55
13
0
24 Aug 2021
Self-Supervised Video Representation Learning with Meta-Contrastive
  Network
Self-Supervised Video Representation Learning with Meta-Contrastive Network
Yuanze Lin
Xun Guo
Yan Lu
SSL
78
41
0
19 Aug 2021
How Self-Supervised Learning Can be Used for Fine-Grained Head Pose
  Estimation?
How Self-Supervised Learning Can be Used for Fine-Grained Head Pose Estimation?
Mahdi Pourmirzaei
Farzaneh Esmaili
G. Montazer
Sasan Karamizadeh
Seyedehsamaneh Shojaeilangari
62
0
0
10 Aug 2021
Learning to Cut by Watching Movies
Learning to Cut by Watching Movies
Alejandro Pardo
Fabian Caba Heilbron
Juan Carlos León Alcázar
Ali K. Thabet
Guohao Li
VGen
127
20
0
09 Aug 2021
Video Contrastive Learning with Global Context
Video Contrastive Learning with Global Context
Haofei Kuang
Yi Zhu
Zhi-Li Zhang
Xinyu Li
Joseph Tighe
Sören Schwertfeger
C. Stachniss
Mu Li
SSLAI4TS
93
61
0
05 Aug 2021
Federated Self-Training for Semi-Supervised Audio Recognition
Federated Self-Training for Semi-Supervised Audio Recognition
Vasileios Tsouvalas
Aaqib Saeed
T. Ozcelebi
FedML
88
16
0
14 Jul 2021
Self-Supervised Multi-Modal Alignment for Whole Body Medical Imaging
Self-Supervised Multi-Modal Alignment for Whole Body Medical Imaging
Rhydian Windsor
A. Jamaludin
T. Kadir
Andrew Zisserman
90
16
0
14 Jul 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLMViT
125
170
0
21 Jun 2021
Improving Multi-Modal Learning with Uni-Modal Teachers
Improving Multi-Modal Learning with Uni-Modal Teachers
Chenzhuang Du
Tingle Li
Yichen Liu
Zixin Wen
Tianyu Hua
Yue Wang
Hang Zhao
59
47
0
21 Jun 2021
Improving On-Screen Sound Separation for Open-Domain Videos with
  Audio-Visual Self-Attention
Improving On-Screen Sound Separation for Open-Domain Videos with Audio-Visual Self-Attention
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
VLM
92
8
0
17 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through
  Self-supervision
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
Maja Pantic
SSL
63
54
0
16 Jun 2021
Watching Too Much Television is Good: Self-Supervised Audio-Visual
  Representation Learning from Movies and TV Shows
Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows
Mahdi M. Kalayeh
Nagendra Kamath
Lingyi Liu
Ashok Chandrashekar
SSL
55
2
0
16 Jun 2021
Learning Audio-Visual Dereverberation
Learning Audio-Visual Dereverberation
Changan Chen
Wei-Ju Sun
David Harwath
Kristen Grauman
95
32
0
14 Jun 2021
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Shaobo Min
Qi Dai
Hongtao Xie
Chuang Gan
Yongdong Zhang
Jingdong Wang
SSL
72
7
0
13 Jun 2021
Anticipative Video Transformer
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
96
212
0
03 Jun 2021
Cross-Domain First Person Audio-Visual Action Recognition through
  Relative Norm Alignment
Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
127
12
0
03 Jun 2021
Automatic audiovisual synchronisation for ultrasound tongue imaging
Automatic audiovisual synchronisation for ultrasound tongue imaging
Aciel Eshky
J. Cleland
M. Ribeiro
Eleanor Sugden
Korin Richmond
Steve Renals
30
7
0
31 May 2021
Home Action Genome: Cooperative Compositional Action Understanding
Home Action Genome: Cooperative Compositional Action Understanding
Nishant Rai
Haofeng Chen
Jingwei Ji
Rishi Desai
Kazuki Kozuka
Shun Ishizaka
Ehsan Adeli
Juan Carlos Niebles
45
78
0
11 May 2021
Previous
1234567
Next