ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.06405
  4. Cited By
Masked Autoencoders that Listen

Masked Autoencoders that Listen

13 July 2022
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
ArXivPDFHTML

Papers citing "Masked Autoencoders that Listen"

50 / 58 papers shown
Title
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Christos Plachouras
Julien Guinot
George Fazekas
Elio Quinton
Emmanouil Benetos
Johan Pauwels
125
1
0
09 May 2025
Can Masked Autoencoders Also Listen to Birds?
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
29
0
0
17 Apr 2025
Clustering and novel class recognition: evaluating bioacoustic deep learning feature extractors
Clustering and novel class recognition: evaluating bioacoustic deep learning feature extractors
Vincent S. Kather
Burooj Ghani
Dan Stowell
31
0
0
09 Apr 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
67
3
0
13 Mar 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
71
8
0
24 Feb 2025
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
76
0
0
20 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
69
0
0
05 Feb 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
74
7
0
10 Jan 2025
Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation
Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation
Ivan Rinaldi
Nicola Fanelli
Giovanna Castellano
G. Vessio
31
2
0
07 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
41
3
0
03 Oct 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
51
3
0
23 Sep 2024
High-Resolution Speech Restoration with Latent Diffusion Model
High-Resolution Speech Restoration with Latent Diffusion Model
Tushar Dhyani
Florian Lux
Michele Mancusi
Giorgio Fabbro
Fritz Hohl
Ngoc Thang Vu
DiffM
37
0
0
17 Sep 2024
Compositional Audio Representation Learning
Compositional Audio Representation Learning
Sripathi Sridhar
Mark Cartwright
AI4TS
35
0
0
15 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
25
2
0
14 Sep 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
K. Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
30
3
0
12 Jun 2024
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and
  Missing Labels
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels
Samuele Cornell
Janek Ebbers
Constance Douwes
Irene Martín-Morató
Manu Harju
A. Mesaros
Romain Serizel
37
13
0
12 Jun 2024
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake
  Audio Detection
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
Xiaopeng Wang
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
Yuankun Xie
...
Xuefei Liu
Yongwei Li
Xin Qi
Yi Lu
Shuchen Shi
33
4
0
05 Jun 2024
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion
  Recognition
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition
Peihao Xiang
Chaohao Lin
Kaida Wu
Ou Bai
34
3
0
28 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
Synthetic training set generation using text-to-audio models for
  environmental sound classification
Synthetic training set generation using text-to-audio models for environmental sound classification
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
35
2
0
26 Mar 2024
Multiscale Matching Driven by Cross-Modal Similarity Consistency for
  Audio-Text Retrieval
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval
Qian Wang
Jia-Chen Gu
Zhen-Hua Ling
35
2
0
15 Mar 2024
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with
  Unsupervised Audio Mixtures
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
Afrina Tabassum
Dung N. Tran
Trung D. Q. Dang
Ismini Lourentzou
K. Koishida
47
0
0
14 Mar 2024
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
44
63
0
11 Dec 2023
TabMT: Generating tabular data with masked transformers
TabMT: Generating tabular data with masked transformers
Manbir Gulati
Paul F. Roysdon
LMTD
45
33
0
11 Dec 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
41
251
0
21 Nov 2023
Improving Discriminative Multi-Modal Learning with Large-Scale
  Pre-Trained Models
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models
Chenzhuang Du
Yue Zhao
Chonghua Liao
Jiacheng You
Jie Fu
Hang Zhao
39
2
0
08 Oct 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio
  Tagging by Aligning with Label Text Description
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
31
0
0
28 Sep 2023
Exploring Self-Supervised Contrastive Learning of Spatial Sound Event
  Representation
Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
SSL
29
1
0
27 Sep 2023
Test-Time Training for Speech
Test-Time Training for Speech
Sri Harsha Dumpala
Chandramouli Shama Sastry
Sageev Oore
39
1
0
19 Sep 2023
An Effective Transformer-based Contextual Model and Temporal Gate
  Pooling for Speaker Identification
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
30
1
0
22 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
40
1
0
14 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
25
222
0
10 Aug 2023
Streaming Audio Transformers for Online Audio Tagging
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
34
4
0
29 May 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
34
157
0
19 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
79
6
0
05 May 2023
MMViT: Multiscale Multiview Vision Transformers
MMViT: Multiscale Multiview Vision Transformers
Yuchen Liu
Natasha Ong
Kaiyan Peng
Bo Xiong
Qifan Wang
...
Madian Khabsa
Kaiyue Yang
David C. Liu
Donald Williamson
Hanchao Yu
ViT
22
4
0
28 Apr 2023
A vector quantized masked autoencoder for speech emotion recognition
A vector quantized masked autoencoder for speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
24
20
0
21 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
110
3,030
0
14 Apr 2023
On Robustness in Multimodal Learning
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
32
2
0
10 Apr 2023
Low-Complexity Audio Embedding Extractors
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
21
4
0
03 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with
  Transformers
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
35
4
0
03 Mar 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
42
7
0
28 Jan 2023
Randomized Quantization: A Generic Augmentation for Data Agnostic
  Self-supervised Learning
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
30
5
0
19 Dec 2022
Scaling Language-Image Pre-training via Masking
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
27
318
0
01 Dec 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
32
13
0
17 Nov 2022
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge
  Distillation
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Florian Schmid
Khaled Koutini
Gerhard Widmer
ViT
20
58
0
09 Nov 2022
MAEEG: Masked Auto-encoder for EEG Representation Learning
MAEEG: Masked Auto-encoder for EEG Representation Learning
H. Chien
Hanlin Goh
Christopher M. Sandino
Joseph Y. Cheng
17
48
0
27 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero
  Resource Speech Challenge
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
32
30
0
27 Oct 2022
Play It Back: Iterative Attention for Audio Recognition
Play It Back: Iterative Attention for Audio Recognition
Alexandros Stergiou
Dima Damen
37
4
0
20 Oct 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
30
417
0
17 Oct 2022
12
Next