Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 463 papers shown
Title
EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use
Jan Schluter
Gerald Gutenbrunner
VLM
39
12
0
12 Jul 2022
A Multi-tasking Model of Speaker-Keyword Classification for Keeping Human in the Loop of Drone-assisted Inspection
Yu Li
Anisha Parsan
Bill Wang
Penghao Dong
Shanshan Yao
Ruwen Qin
29
5
0
08 Jul 2022
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
Sheng Kuang
Kiki van der Heijden
S. Mehrkanoon
19
3
0
08 Jul 2022
Data Augmentation for Dementia Detection in Spoken Language
Anna Hlédiková
Dominika Woszczyk
Alican Acman
Soteris Demetriou
Björn Schuller
36
12
0
26 Jun 2022
Avoid Overfitting User Specific Information in Federated Keyword Spotting
Xin-Chun Li
Jin-Lin Tang
Shaoming Song
Bingshuai Li
Yinchuan Li
Yunfeng Shao
Le Gan
De-Chuan Zhan
FedML
AAML
30
9
0
17 Jun 2022
Event-related data conditioning for acoustic event classification
Yuanbo Hou
Dick Botteldooren
28
3
0
16 Jun 2022
It's Time for Artistic Correspondence in Music and Video
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
19
37
0
14 Jun 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
31
24
0
20 May 2022
The AI Mechanic: Acoustic Vehicle Characterization Neural Networks
Adam M. Terwilliger
J. Siegel
22
2
0
19 May 2022
Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
30
5
0
17 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
34
13
0
11 May 2022
Robustness of Neural Architectures for Audio Event Detection
Juncheng Billy Li
Zheng Wang
Shuhui Qu
Florian Metze
8
1
0
06 May 2022
Pseudo strong labels for large scale weakly supervised audio tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
36
6
0
28 Apr 2022
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Dading Chong
Helin Wang
Peilin Zhou
Qingcheng Zeng
39
65
0
27 Apr 2022
ATST: Audio Representation Learning with Teacher-Student Transformer
Xian Li
Xiaofei Li
ViT
23
20
0
26 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
36
53
0
15 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
49
39
0
06 Apr 2022
MetaAudio: A Few-Shot Audio Classification Benchmark
Calum Heggan
S. Budgett
Timothy M. Hospedales
Mehrdad Yaghoobi
VLM
41
32
0
05 Apr 2022
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
16
83
0
01 Apr 2022
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Xin Jing
Shuo Liu
Emilia Parada-Cabaleiro
Andreas Triantafyllopoulos
Meishu Song
Zijiang Yang
Björn W. Schuller
43
2
0
31 Mar 2022
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
Alan Baade
Puyuan Peng
David Harwath
25
95
0
30 Mar 2022
DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning
Sreyan Ghosh
Ashish Seth
and Deepak Mittal
Maneesh Singh
S. Umesh
SSL
27
6
0
25 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
36
9
0
25 Mar 2022
CT-SAT: Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Zhaoyi Liu
Bo Kang
Yun Wang
Dick Botteldooren
ViT
32
5
0
22 Mar 2022
PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
Samuel Yu
Peter Wu
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
LRM
36
13
0
21 Mar 2022
SepTr: Separable Transformer for Audio Spectrogram Processing
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Fahad Shahbaz Khan
ViT
23
30
0
17 Mar 2022
Learning Audio Representations with MLPs
Mashrur M. Morshed
Ahmad Omar Ahsan
H. Mahmud
Md. Kamrul Hasan
27
4
0
16 Mar 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
25
29
0
13 Mar 2022
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Jingfei Xia
Mingchen Zhuge
Tiantian Geng
Shun Fan
Yuantai Wei
Zhenyu He
Feng Zheng
23
14
0
08 Mar 2022
HEAR: Holistic Evaluation of Audio Representations
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
39
100
0
06 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
Saghir Alfasly
Jian Lu
C. Xu
Yuru Zou
42
18
0
06 Mar 2022
TRILLsson: Distilled Universal Paralinguistic Speech Representations
Joel Shor
Subhashini Venugopalan
25
37
0
01 Mar 2022
Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation Study
Daniel C. Tompkins
Kshitiz Kumar
Jian Wu
17
5
0
07 Feb 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
127
264
0
02 Feb 2022
Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data
A. Shirian
Krishna Somandepalli
T. Guha
SSL
41
10
0
31 Jan 2022
Automatic Audio Captioning using Attention weighted Event based Embeddings
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
31
0
0
28 Jan 2022
Continual Transformers: Redundancy-Free Attention for Online Inference
Lukas Hedegaard
Arian Bakhtiarnia
Alexandros Iosifidis
CLL
27
11
0
17 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting
Dianwen Ng
Yunqi Chen
Biao Tian
Qiang Fu
Chng Eng Siong
24
46
0
15 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
42
207
0
07 Jan 2022
Learning Operators with Coupled Attention
Georgios Kissas
Jacob H. Seidman
Leonardo Ferreira Guilhoto
V. Preciado
George J. Pappas
P. Perdikaris
32
110
0
04 Jan 2022
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
Yanpeng Zhao
Jack Hessel
Youngjae Yu
Ximing Lu
Rowan Zellers
Yejin Choi
17
27
0
16 Dec 2021
Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
19
43
0
15 Dec 2021
Responding to Challenge Call of Machine Learning Model Development in Diagnosing Respiratory Disease Sounds
Negin Melek
26
2
0
29 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
35
73
0
25 Nov 2021
Towards Learning Universal Audio Representations
Luyu Wang
Pauline Luc
Yan Wu
Adrià Recasens
Lucas Smaira
...
Andrew Jaegle
Jean-Baptiste Alayrac
Sander Dieleman
João Carreira
Aaron van den Oord
SSL
32
68
0
23 Nov 2021
MT3: Multi-Task Multitrack Music Transcription
Josh Gardner
Ian Simon
Ethan Manilow
Curtis Hawthorne
Jesse Engel
37
95
0
04 Nov 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
38
267
0
19 Oct 2021
DECAR: Deep Clustering for learning general-purpose Audio Representations
Sreyan Ghosh
Sandesh V Katta
Ashish Seth
S. Umesh
SSL
36
12
0
17 Oct 2021
Attention-Free Keyword Spotting
Mashrur M. Morshed
Ahmad Omar Ahsan
33
9
0
14 Oct 2021
Previous
1
2
3
...
10
8
9
Next