ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.00230
  4. Cited By
Cooperative Learning of Audio and Video Models from Self-Supervised
  Synchronization
v1v2 (latest)

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

30 June 2018
Bruno Korbar
Du Tran
Lorenzo Torresani
ArXiv (abs)PDFHTML

Papers citing "Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"

50 / 316 papers shown
Title
Hyperbolic Audio-visual Zero-shot Learning
Hyperbolic Audio-visual Zero-shot Learning
Jie Hong
Zeeshan Hayder
Junlin Han
Pengfei Fang
Mehrtash Harandi
L. Petersson
87
16
0
24 Aug 2023
Opening the Vocabulary of Egocentric Actions
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
111
18
0
22 Aug 2023
MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product
  Summarization
MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization
Tao Chen
Zexiong Lin
Hui Li
Jiayi Ji
Yiyi Zhou
Guanbin Li
Rongrong Ji
81
0
0
22 Aug 2023
Induction Network: Audio-Visual Modality Gap-Bridging for
  Self-Supervised Sound Source Localization
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
Tianyu Liu
Peng Zhang
Wei Huang
Yufei Zha
Tao You
Yanni Zhang
SSL
67
2
0
09 Aug 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric
  Videos
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSLEgoV
114
4
0
10 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised
  Audio-Visual Video Parsing
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
125
9
0
05 Jul 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive
  Coding
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
Zengjie Song
Zhaoxiang Zhang
57
1
0
19 Jun 2023
A Large-Scale Analysis on Self-Supervised Video Representation Learning
A Large-Scale Analysis on Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
99
3
0
09 Jun 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language
  Perspective
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
123
9
0
01 Jun 2023
A Unified Audio-Visual Learning Framework for Localization, Separation,
  and Recognition
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
79
22
0
30 May 2023
LANISTR: Multimodal Learning from Structured and Unstructured Data
LANISTR: Multimodal Learning from Structured and Unstructured Data
Sayna Ebrahimi
Sercan O. Arik
Yihe Dong
Tomas Pfister
74
4
0
26 May 2023
How does Contrastive Learning Organize Images?
How does Contrastive Learning Organize Images?
Yunzhe Zhang
Yao Lu
Qi Xuan
SSL
83
1
0
17 May 2023
Cross-Modal Global Interaction and Local Alignment for Audio-Visual
  Speech Recognition
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Chen Chen
Heqing Zou
Qiu-shi Zhu
Eng Siong Chng
103
8
0
16 May 2023
Self-Supervised Video Representation Learning via Latent Time Navigation
Self-Supervised Video Representation Learning via Latent Time Navigation
Di Yang
Yaohui Wang
Quan Kong
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
SSLAI4TS
87
11
0
10 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze
  Anticipation
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
109
8
0
06 May 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
134
2
0
12 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
127
50
0
31 Mar 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo
Yapeng Tian
86
43
0
29 Mar 2023
Nonlinear Independent Component Analysis for Principled Disentanglement
  in Unsupervised Deep Learning
Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep Learning
Aapo Hyvarinen
Ilyes Khemakhem
H. Morioka
CMLOOD
136
37
0
29 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
79
16
0
28 Mar 2023
Curricular Contrastive Regularization for Physics-aware Single Image
  Dehazing
Curricular Contrastive Regularization for Physics-aware Single Image Dehazing
Yupei Zheng
Jiahui Zhan
Shengfeng He
Junyu Dong
Yong Du
127
126
0
24 Mar 2023
Egocentric Audio-Visual Object Localization
Egocentric Audio-Visual Object Localization
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
77
35
0
23 Mar 2023
DrasCLR: A Self-supervised Framework of Learning Disease-related and
  Anatomy-specific Representation for 3D Medical Images
DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images
K. Yu
Li Sun
Junxiang Chen
Maxwell Reynolds
Tigmanshu Chaudhary
Kayhan Batmanghelich
109
1
0
21 Feb 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
Audio-Visual Contrastive Learning with Temporal Self-Supervision
Simon Jenni
Alexander Black
John Collomosse
SSL
82
16
0
15 Feb 2023
Confidence-Aware Calibration and Scoring Functions for Curriculum
  Learning
Confidence-Aware Calibration and Scoring Functions for Curriculum Learning
Shuang Ao
Stefan Rueger
Advaith Siddharthan
UQCV
87
0
0
29 Jan 2023
Zorro: the masked multimodal transformer
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
97
21
0
23 Jan 2023
Novel-View Acoustic Synthesis
Novel-View Acoustic Synthesis
Changan Chen
Alexander Richard
Roman Shapovalov
V. Ithapu
Natalia Neverova
Kristen Grauman
Andrea Vedaldi
90
38
0
20 Jan 2023
A Survey on Self-supervised Learning: Algorithms, Applications, and
  Future Trends
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Jie Gui
Tuo Chen
Jing Zhang
Qiong Cao
Zhe Sun
Haoran Luo
Dacheng Tao
244
161
0
13 Jan 2023
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
123
4
0
05 Jan 2023
EgoDistill: Egocentric Head Motion Distillation for Efficient Video
  Understanding
EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding
Shuhan Tan
Tushar Nagarajan
Kristen Grauman
96
22
0
05 Jan 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Chao Feng
Ziyang Chen
Andrew Owens
95
78
0
04 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action
  Recognition
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Guohao Li
AAML
141
9
0
03 Jan 2023
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
102
55
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
128
78
0
15 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled
  Videos
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLMCLIP
92
29
0
14 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
96
49
0
12 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
129
45
0
09 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
Yating Xu
Conghui Hu
G. Lee
VGen
122
0
0
09 Dec 2022
Self-Supervised Audio-Visual Speech Representations Learning By
  Multimodal Self-Distillation
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation
Jing-Xuan Zhang
Genshun Wan
Zhenhua Ling
Jia Pan
Jianqing Gao
Cong Liu
SSL
88
13
0
06 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
127
0
0
05 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
98
52
0
28 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video
  Representation Learning
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
116
23
0
25 Nov 2022
Contrastive Positive Sample Propagation along the Audio-Visual Event
  Line
Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Dan Guo
Meng Wang
124
54
0
18 Nov 2022
Scaling Multimodal Pre-Training via Cross-Modality Gradient
  Harmonization
Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
Junru Wu
Yi Liang
Feng Han
Hassan Akbari
Zhangyang Wang
Cong Yu
77
10
0
03 Nov 2022
MarginNCE: Robust Sound Localization with a Negative Margin
MarginNCE: Robust Sound Localization with a Negative Margin
Sooyoung Park
Arda Senocak
Joon Son Chung
SSL
80
14
0
03 Nov 2022
Anticipative Feature Fusion Transformer for Multi-Modal Action
  Anticipation
Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation
Zeyun Zhong
David Schneider
Michael Voit
Rainer Stiefelhagen
Jürgen Beyerer
129
47
0
23 Oct 2022
ViFiCon: Vision and Wireless Association Via Self-Supervised Contrastive
  Learning
ViFiCon: Vision and Wireless Association Via Self-Supervised Contrastive Learning
Nicholas Meegan
Hansi Liu
Bryan Bo Cao
Abrar Alali
Kristin J. Dana
Marco Gruteser
Shubham Jain
A. Ashok
61
1
0
11 Oct 2022
HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model
  Pretraining
HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model Pretraining
Chunhui Zhang
Yixiong Chen
Li Liu
Qiong Liu
Xiaoping Zhou
VLM
116
9
0
10 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
130
129
0
02 Oct 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
137
31
0
28 Sep 2022
Previous
1234567
Next