ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.10955
  4. Cited By
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

22 April 2021
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
ArXiv (abs)PDFHTMLGithub (87★)

Papers citing "Distilling Audio-Visual Knowledge by Compositional Contrastive Learning"

50 / 52 papers shown
Title
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
Hilde Kuehne
106
0
0
02 May 2025
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
Rajath Rao
Adithya Ganesan
Oscar Kjell
Jonah Luby
Akshay Raghavan
...
B. Luft
Camilo Ruggero
Neville Ryant
R. Kotov
H. Andrew Schwartz
105
0
0
15 Jan 2025
Contrastive Learning for Unpaired Image-to-Image Translation
Contrastive Learning for Unpaired Image-to-Image Translation
Taesung Park
Alexei A. Efros
Richard Y. Zhang
Jun-Yan Zhu
SSL
89
1,235
0
30 Jul 2020
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
542
610
0
21 Jul 2020
Heterogeneous Knowledge Distillation using Information Flow Modeling
Heterogeneous Knowledge Distillation using Information Flow Modeling
Nikolaos Passalis
Maria Tzelepi
Anastasios Tefas
75
139
0
02 May 2020
VGGSound: A Large-scale Audio-Visual Dataset
VGGSound: A Large-scale Audio-Visual Dataset
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
92
583
0
29 Apr 2020
Improved Baselines with Momentum Contrastive Learning
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
508
3,449
0
09 Mar 2020
Learning Robust Representations via Multi-View Information Bottleneck
Learning Robust Representations via Multi-View Information Bottleneck
Marco Federici
Anjan Dutta
Patrick Forré
Nate Kushman
Zeynep Akata
SLR
67
258
0
17 Feb 2020
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
393
18,897
0
13 Feb 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLMSSL
199
1,084
0
21 Dec 2019
ASR is all you need: cross-modal distillation for lip reading
ASR is all you need: cross-modal distillation for lip reading
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
58
135
0
28 Nov 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
122
432
0
28 Nov 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
142
88
0
24 Oct 2019
Contrastive Representation Distillation
Contrastive Representation Distillation
Yonglong Tian
Dilip Krishnan
Phillip Isola
176
1,054
0
23 Oct 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
84
389
0
31 Jul 2019
Contrastive Multiview Coding
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
182
2,412
0
13 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard?
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
154
453
0
29 May 2019
Temporal Cycle-Consistency Learning
Temporal Cycle-Consistency Learning
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
P. Sermanet
Andrew Zisserman
SSLAI4TS
92
276
0
16 Apr 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
131
210
0
16 Apr 2019
Relational Knowledge Distillation
Relational Knowledge Distillation
Wonpyo Park
Dongju Kim
Yan Lu
Minsu Cho
89
1,428
0
10 Apr 2019
Correlation Congruence for Knowledge Distillation
Correlation Congruence for Knowledge Distillation
Baoyun Peng
Xiao Jin
Jiaheng Liu
Shunfeng Zhou
Yichao Wu
Yu Liu
Dongsheng Li
Zhaoning Zhang
94
513
0
03 Apr 2019
Learning Correspondence from the Cycle-Consistency of Time
Learning Correspondence from the Cycle-Consistency of Time
Xinyu Wang
Allan Jabri
Alexei A. Efros
SSL
91
491
0
18 Mar 2019
DistInit: Learning Video Representations Without a Single Labeled Video
DistInit: Learning Video Representations Without a Single Labeled Video
Rohit Girdhar
Du Tran
Lorenzo Torresani
Deva Ramanan
48
54
0
26 Jan 2019
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Nam S. Vo
Lu Jiang
Chen Sun
Kevin Patrick Murphy
Li Li
Li Fei-Fei
James Hays
CoGe
68
368
0
18 Dec 2018
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
169
3,286
0
10 Dec 2018
Self-Supervised Video Representation Learning with Space-Time Cubic
  Puzzles
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles
Dahun Kim
Donghyeon Cho
In So Kweon
SSL
91
349
0
24 Nov 2018
Deep Audio-Visual Speech Recognition
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
98
710
0
06 Sep 2018
Learning deep representations by mutual information estimation and
  maximization
Learning deep representations by mutual information estimation and maximization
R. Devon Hjelm
A. Fedorov
Samuel Lavoie-Marchildon
Karan Grewal
Phil Bachman
Adam Trischler
Yoshua Bengio
SSLDRL
352
2,675
0
20 Aug 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
75
271
0
16 Aug 2018
Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning
Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning
U. Büchler
Biagio Brattoli
Bjorn Ommer
OODSSL
83
114
0
30 Jul 2018
X2Face: A network for controlling face generation by using images,
  audio, and pose codes
X2Face: A network for controlling face generation by using images, audio, and pose codes
Olivia Wiles
A. Sophia Koepke
Andrew Zisserman
CVBM
91
415
0
27 Jul 2018
Representation Learning with Contrastive Predictive Coding
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRLSSL
356
10,369
0
10 Jul 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
100
754
0
10 Apr 2018
Learning Deep Representations with Probabilistic Knowledge Transfer
Learning Deep Representations with Probabilistic Knowledge Transfer
Nikolaos Passalis
Anastasios Tefas
66
413
0
28 Mar 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
109
439
0
23 Mar 2018
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual
  Learning
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
Andrew Owens
Jiajun Wu
Josh H. McDermott
William T. Freeman
Antonio Torralba
SSL
71
176
0
20 Dec 2017
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjDVOS
116
530
0
18 Dec 2017
A Closer Look at Spatiotemporal Convolutions for Action Recognition
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
240
3,033
0
30 Nov 2017
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Zhaofan Qiu
Ting Yao
Tao Mei
102
1,663
0
28 Nov 2017
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Kensho Hara
Hirokatsu Kataoka
Y. Satoh
3DPC
133
1,935
0
27 Nov 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
240
8,045
0
22 May 2017
You said that?
You said that?
Joon Son Chung
A. Jamaludin
Andrew Zisserman
CVBM
74
260
0
08 May 2017
Paying More Attention to Attention: Improving the Performance of
  Convolutional Neural Networks via Attention Transfer
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Sergey Zagoruyko
N. Komodakis
147
2,590
0
12 Dec 2016
SoundNet: Learning Sound Representations from Unlabeled Video
SoundNet: Learning Sound Representations from Unlabeled Video
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
135
1,044
0
27 Oct 2016
Temporal Segment Networks: Towards Good Practices for Deep Action
  Recognition
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
120
3,841
0
02 Aug 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.3K
194,641
0
10 Dec 2015
Cross Modal Distillation for Supervision Transfer
Cross Modal Distillation for Supervision Transfer
Saurabh Gupta
Judy Hoffman
Jitendra Malik
127
538
0
02 Jul 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
367
19,745
0
09 Mar 2015
FitNets: Hints for Thin Deep Nets
FitNets: Hints for Thin Deep Nets
Adriana Romero
Nicolas Ballas
Samira Ebrahimi Kahou
Antoine Chassang
C. Gatta
Yoshua Bengio
FedML
328
3,906
0
19 Dec 2014
Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
261
7,545
0
09 Jun 2014
12
Next