ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.01515
  4. Cited By
MAST: Multiscale Audio Spectrogram Transformers
v1v2 (latest)

MAST: Multiscale Audio Spectrogram Transformers

2 November 2022
Sreyan Ghosh
Ashish Seth
S. Umesh
Tianyi Zhou
ArXiv (abs)PDFHTML

Papers citing "MAST: Multiscale Audio Spectrogram Transformers"

12 / 12 papers shown
Title
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
111
3
0
02 Oct 2024
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
153
693
0
02 Dec 2021
Towards Learning Universal Audio Representations
Towards Learning Universal Audio Representations
Luyu Wang
Pauline Luc
Yan Wu
Adrià Recasens
Lucas Smaira
...
Andrew Jaegle
Jean-Baptiste Alayrac
Sander Dieleman
João Carreira
Aaron van den Oord
SSL
108
71
0
23 Nov 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
135
1,265
0
22 Apr 2021
An Empirical Study of Training Self-Supervised Vision Transformers
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
159
1,871
0
05 Apr 2021
AST: Audio Spectrogram Transformer
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
142
883
0
05 Apr 2021
Slow-Fast Auditory Streams For Audio Recognition
Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
90
68
0
05 Mar 2021
Y-Vector: Multiscale Waveform Encoder for Speaker Embedding
Y-Vector: Multiscale Waveform Encoder for Speaker Embedding
Ge Zhu
Fei Jiang
Z. Duan
58
25
0
24 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
299
5,849
0
20 Jun 2020
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
213
12,136
0
13 Nov 2019
A multi-device dataset for urban acoustic scene classification
A multi-device dataset for urban acoustic scene classification
A. Mesaros
Toni Heittola
Tuomas Virtanen
35
381
0
25 Jul 2018
VoxCeleb: a large-scale speaker identification dataset
VoxCeleb: a large-scale speaker identification dataset
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
127
2,283
0
26 Jun 2017
1