ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.01778
  4. Cited By
AST: Audio Spectrogram Transformer
v1v2v3 (latest)

AST: Audio Spectrogram Transformer

5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
    ViT
ArXiv (abs)PDFHTML

Papers citing "AST: Audio Spectrogram Transformer"

50 / 486 papers shown
Title
DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter
  for Speaker Verification
DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification
Yangfu Li
Jiapan Gan
Xiaodan Lin
57
6
0
20 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio
  Classification
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
87
22
0
19 Mar 2023
Weight-sharing Supernet for Searching Specialized Acoustic Event
  Classification Networks Across Device Constraints
Weight-sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Constraints
Guan-Ting Lin
Qingming Tang
Chieh-Chi Kao
Viktor Rozgic
Chao Wang
93
1
0
18 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
164
15
0
14 Mar 2023
CAT: Causal Audio Transformer for Audio Classification
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu
Hanlin Lu
Jianbo Yuan
Xinyu Li
ViT
88
24
0
14 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
78
10
0
12 Mar 2023
AST-SED: An Effective Sound Event Detection Method Based on Audio
  Spectrogram Transformer
AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer
Kang Li
Yan Song
Lirong Dai
Ian Mcloughlin
Xin Fang
Lin Liu
78
22
0
07 Mar 2023
Heterogeneous Graph Learning for Acoustic Event Classification
Heterogeneous Graph Learning for Acoustic Event Classification
A. Shirian
Mona Ahmadian
Krishna Somandepalli
T. Guha
71
2
0
05 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
89
14
0
04 Mar 2023
Low-Complexity Audio Embedding Extractors
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
60
4
0
03 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with
  Transformers
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
69
4
0
03 Mar 2023
Adapter Incremental Continual Learning of Efficient Audio Spectrogram
  Transformers
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
Nithish Muthuchamy Selvaraj
Xiaobao Guo
A. Kong
Bingquan Shen
Alex C. Kot
CLL
51
8
0
28 Feb 2023
Improving Speech Enhancement via Event-based Query
Improving Speech Enhancement via Event-based Query
Yifei Xin
Xiulian Peng
Yan Lu
64
6
0
20 Feb 2023
A dataset for Audio-Visual Sound Event Detection in Movies
A dataset for Audio-Visual Sound Event Detection in Movies
Rajat Hebbar
Digbalay Bose
Krishna Somandepalli
Veena Vijai
Shrikanth Narayanan
58
9
0
14 Feb 2023
SemanticAC: Semantics-Assisted Framework for Audio Classification
SemanticAC: Semantics-Assisted Framework for Audio Classification
Yicheng Xiao
Yue Ma
Shuyan Li
Hantao Zhou
Ran Liao
Xiu Li
57
9
0
12 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
72
1
0
07 Feb 2023
Neural Relation Graph: A Unified Framework for Identifying Label Noise
  and Outlier Data
Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data
Jang-Hyun Kim
Sangdoo Yun
Hyun Oh Song
85
19
0
29 Jan 2023
Zorro: the masked multimodal transformer
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
92
21
0
23 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
86
17
0
19 Jan 2023
Does compressing activations help model parallel training?
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
77
9
0
06 Jan 2023
Automatic Sound Event Detection and Classification of Great Ape Calls
  Using Neural Networks
Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks
Zifan Jiang
A. Soldati
Isaac Schamberg
A. R. Lameira
Steven Moran
94
7
0
05 Jan 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
124
299
0
18 Dec 2022
Learning from Taxonomy: Multi-label Few-Shot Classification for Everyday
  Sound Recognition
Learning from Taxonomy: Multi-label Few-Shot Classification for Everyday Sound Recognition
Jinhua Liang
Huy P Phan
Emmanouil Benetos
122
12
0
17 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
85
54
0
15 Dec 2022
CLIPPO: Image-and-Language Understanding from Pixels Only
CLIPPO: Image-and-Language Understanding from Pixels Only
Michael Tschannen
Basil Mustafa
N. Houlsby
CLIPVLM
104
49
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
119
78
0
15 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
120
45
0
09 Dec 2022
Federated Learning for Inference at Anytime and Anywhere
Federated Learning for Inference at Anytime and Anywhere
Zicheng Liu
Da Li
Javier Fernandez-Marques
Stefanos Laskaridis
Yan Gao
Łukasz Dudziak
Stan Z. Li
S. Hu
Timothy M. Hospedales
FedML
88
5
0
08 Dec 2022
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance
  Generation
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
Ronghui Li
Junfan Zhao
Yachao Zhang
Mingyang Su
Zeping Ren
Han Zhang
Yansong Tang
Xiuhua Li
DiffM
118
57
0
07 Dec 2022
Learning General Audio Representations with Large-Scale Training of
  Patchout Audio Transformers
Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers
Khaled Koutini
Shahed Masoudian
Florian Schmid
Hamid Eghbalzadeh
Jan Schluter
Gerhard Widmer
149
6
0
25 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video
  Representation Learning
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
112
23
0
25 Nov 2022
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event
  Classification
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
Sara Atito
Muhammad Awais
Wenwu Wang
Mark D. Plumbley
J. Kittler
ViT
71
11
0
23 Nov 2022
Ontology-aware Learning and Evaluation for Audio Tagging
Ontology-aware Learning and Evaluation for Audio Tagging
Haohe Liu
Qiuqiang Kong
Xubo Liu
Xinhao Mei
Wenwu Wang
Mark D. Plumbley
47
4
0
22 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient
  Training for Large-scale Transformers
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Z. Yao
Xiaoxia Wu
Conglong Li
Connor Holmes
Minjia Zhang
Cheng-rong Li
Yuxiong He
87
12
0
17 Nov 2022
Music Instrument Classification Reprogrammed
Music Instrument Classification Reprogrammed
Hsin-Hung Chen
Alexander Lerch
103
4
0
15 Nov 2022
The Birds Need Attention Too: Analysing usage of Self Attention in
  identifying bird calls in soundscapes
The Birds Need Attention Too: Analysing usage of Self Attention in identifying bird calls in soundscapes
Chandra Kanth Nagesh
Abhishek Purushothama
53
2
0
14 Nov 2022
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge
  Distillation
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Florian Schmid
Khaled Koutini
Gerhard Widmer
ViT
86
60
0
09 Nov 2022
Effective Audio Classification Network Based on Paired Inverse Pyramid
  Structure and Dense MLP Block
Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block
Yunhao Chen
Yunjie Zhu
Zihui Yan
Yifan Huang
Zhen Ren
Jianlu Shen
Lifang Chen
123
9
0
05 Nov 2022
Integrated Parameter-Efficient Tuning for General-Purpose Audio Models
Integrated Parameter-Efficient Tuning for General-Purpose Audio Models
Ju-ho Kim
Ju-Sung Heo
Hyun-Seo Shin
Chanmann Lim
Ha-Jin Yu
35
5
0
04 Nov 2022
MAST: Multiscale Audio Spectrogram Transformers
MAST: Multiscale Audio Spectrogram Transformers
Sreyan Ghosh
Ashish Seth
S. Umesh
Tianyi Zhou
85
3
0
02 Nov 2022
Low-Resource Music Genre Classification with Cross-Modal Neural Model
  Reprogramming
Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming
Yun-Ning Hung
Chao-Han Huck Yang
Pin-Yu Chen
Alexander Lerch
100
19
0
02 Nov 2022
Masked Modeling Duo: Learning Representations by Encouraging Both
  Networks to Model the Input
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
105
33
0
26 Oct 2022
Audio MFCC-gram Transformers for respiratory insufficiency detection in
  COVID-19
Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19
M. Gauy
Marcelo Finger
56
9
0
25 Oct 2022
GCT: Gated Contextual Transformer for Sequential Audio Tagging
GCT: Gated Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Yun Wang
Wenwu Wang
Dick Botteldooren
64
0
0
22 Oct 2022
Description and analysis of novelties introduced in DCASE Task 4 2022 on
  the baseline system
Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system
Francesca Ronchini
Samuele Cornell
Romain Serizel
Nicolas Turpault
Eduardo Fonseca
D. Ellis
55
14
0
14 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Emily R. Bartusiak
Edward J. Delp
60
14
0
14 Oct 2022
Supervised and Unsupervised Learning of Audio Representations for Music
  Understanding
Supervised and Unsupervised Learning of Audio Representations for Music Understanding
Matthew C. McCallum
Filip Korzeniowski
Sergio Oramas
F. Gouyon
Andreas F. Ehmann
SSL
139
41
0
07 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Yangfu Li
Xiaodan Lin
Jiaxin Yang
60
0
0
06 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
122
128
0
02 Oct 2022
An empirical study of weakly supervised audio tagging embeddings for
  general audio representations
An empirical study of weakly supervised audio tagging embeddings for general audio representations
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
65
1
0
30 Sep 2022
Previous
123...10789
Next