Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.12768
Cited By
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
27 April 2022
Dading Chong
Helin Wang
Peilin Zhou
Qingcheng Zeng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training"
50 / 50 papers shown
Title
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
59
0
0
20 Mar 2025
When Vision Models Meet Parameter Efficient Look-Aside Adapters Without Large-Scale Audio Pretraining
Juan Yeo
Jinkwan Jang
Kyubyung Chae
Seongkyu Mun
Taesup Kim
VLM
57
0
0
08 Dec 2024
Self-Supervised Radio Pre-training: Toward Foundational Models for Spectrogram Learning
Ahmed Aboulfotouh
Ashkan Eshaghbeigi
Dimitrios Karslidis
Hatem Abou-Zeid
SSL
25
1
0
14 Nov 2024
Pre-training with Synthetic Patterns for Audio
Yuchi Ishikawa
Tatsuya Komatsu
Yoshimitsu Aoki
28
0
0
01 Oct 2024
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Shota Nakada
Taichi Nishimura
Hokuto Munakata
Masayoshi Kondo
Tatsuya Komatsu
CLIP
VLM
30
0
0
18 Sep 2024
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Amin Karimi Monsefi
Mengxi Zhou
Nastaran Karimi Monsefi
Ser-Nam Lim
Wei-Lun Chao
R. Ramnath
41
1
0
16 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
23
2
0
14 Sep 2024
Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification
Yiqiang Cai
Shengchen Li
Xi Shao
24
3
0
27 Aug 2024
Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation
M. Gauy
Natalia Hitomi Koza
Ricardo Mikio Morita
Gabriel Rocha Stanzione
Arnaldo Cândido Júnior
L. Berti
A. S. Levin
E. Sabino
F. Svartman
Marcelo Finger
38
0
0
30 Jul 2024
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
23
1
0
11 Jul 2024
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
34
10
0
09 Apr 2024
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
Afrina Tabassum
Dung N. Tran
Trung D. Q. Dang
Ismini Lourentzou
K. Koishida
47
0
0
14 Mar 2024
Inductive Graph Alignment Prompt: Bridging the Gap between Graph Pre-training and Inductive Fine-tuning From Spectral Perspective
Yuchen Yan
Peiyan Zhang
Zheng Fang
Qingqing Long
AI4CE
25
13
0
21 Feb 2024
CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition
Ioannis Ziogas
Hessa Alfalahi
A. Khandoker
L. Hadjileontiadis
24
0
0
10 Feb 2024
Masked Audio Modeling with CLAP and Multi-Objective Learning
Yifei Xin
Xiulian Peng
Yan Lu
44
8
0
29 Jan 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Wenxi Chen
Yuzhe Liang
Ziyang Ma
Zhisheng Zheng
Xie Chen
ViT
46
17
0
07 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
33
14
0
31 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
25
17
0
27 Nov 2023
Efficient Supervised Training of Audio Transformers for Music Representation Learning
Pablo Alonso-Jiménez
Xavier Serra
Dmitry Bogdanov
ViT
32
3
0
28 Sep 2023
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
Meng Liu
K. Liang
Dayu Hu
Hao Yu
Yue Liu
Lingyuan Meng
Wenxuan Tu
Sihang Zhou
Xinwang Liu
18
25
0
21 Sep 2023
Test-Time Training for Speech
Sri Harsha Dumpala
Chandramouli Shama Sastry
Sageev Oore
39
1
0
19 Sep 2023
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
L. Pepino
Pablo Riera
Luciana Ferrer
32
4
0
14 Sep 2023
CED: Consistent ensemble distillation for audio tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
20
17
0
23 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
40
1
0
14 Aug 2023
FlexiAST: Flexibility is What AST Needs
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
21
3
0
18 Jul 2023
On Frequency-Wise Normalizations for Better Recording Device Generalization in Audio Spectrogram Transformers
Paul Primus
Gerhard Widmer
22
0
0
20 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
21
25
0
07 Jun 2023
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
26
4
0
29 May 2023
Extending Audio Masked Autoencoders Toward Audio Restoration
Zhi-Wei Zhong
Hao Shi
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
24
4
0
11 May 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
28
6
0
22 Apr 2023
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
32
2
0
10 Apr 2023
Efficient CNNs via Passive Filter Pruning
Arshdeep Singh
Mark D. Plumbley
13
1
0
05 Apr 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
54
325
0
29 Mar 2023
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
18
4
0
03 Mar 2023
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
37
20
0
23 Jan 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
32
254
0
18 Dec 2022
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
32
43
0
09 Dec 2022
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
25
317
0
01 Dec 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
19
21
0
25 Nov 2022
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
Sara Atito
Muhammad Awais
Wenwu Wang
Mark D. Plumbley
J. Kittler
ViT
18
9
0
23 Nov 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
29
13
0
17 Nov 2022
Feature-augmented Machine Reading Comprehension with Auxiliary Tasks
Yifeng Xie
26
0
0
17 Nov 2022
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
34
30
0
26 Oct 2022
Play It Back: Iterative Attention for Audio Recognition
Alexandros Stergiou
Dima Damen
28
4
0
20 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
29
119
0
02 Oct 2022
GAFX: A General Audio Feature eXtractor
Zhaoyang Bu
Han Zhang
Xiaohu Zhu
30
0
0
19 Jul 2022
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
13
268
0
13 Jul 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
26
65
0
26 Apr 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
32
835
0
07 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
1