Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
v1
v2
v3 (latest)
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 486 papers shown
Title
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
181
53
0
28 Sep 2022
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
137
31
0
28 Sep 2022
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference
Mu Yuan
Lan Zhang
Fengxiang He
Xueting Tong
Miao-Hui Song
Zhengyuan Xu
Xiang-Yang Li
60
2
0
28 Sep 2022
UniKW-AT: Unified Keyword Spotting and Audio Tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
62
3
0
23 Sep 2022
An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition
Yang Wu
Pai Peng
Zhenyu Zhang
Yanyan Zhao
Bing Qin
45
1
0
20 Sep 2022
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization
Dianwen Ng
J. Yip
Tanmay Surana
Zhao Yang
Chong Zhang
Yukun Ma
Chongjia Ni
Chng Eng Siong
B. Ma
91
6
0
14 Sep 2022
Classify Respiratory Abnormality in Lung Sounds Using STFT and a Fine-Tuned ResNet18 Network
Zizhao Chen
Hongliang Wang
Chia-Hui Yeh
Xilin Liu
34
16
0
30 Aug 2022
MuLan: A Joint Embedding of Music Audio and Natural Language
Qingqing Huang
A. Jansen
Joonseok Lee
Ravi Ganti
Judith Yue Li
D. Ellis
143
139
0
26 Aug 2022
Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers
Paul Primus
Gerhard Widmer
VLM
112
5
0
24 Aug 2022
A differentiable short-time Fourier transform with respect to the window length
Maxime Leiber
Axel Barrau
Y. Marnissi
D. Abboud
54
9
0
23 Aug 2022
Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition
Armand K. Koupai
M. J. Bocus
Raúl Santos-Rodríguez
Robert Piechocki
Ryan McConville
ViT
62
9
0
15 Aug 2022
Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion
Yuanbo Hou
Bo Kang
Dick Botteldooren
66
3
0
03 Aug 2022
Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Mireille Fares
Michele Grimaldi
Catherine Pelachaud
Nicolas Obin
63
18
0
03 Aug 2022
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
75
23
0
29 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
103
10
0
21 Jul 2022
Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
122
2
0
20 Jul 2022
COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram Transformers
Idil Aytekin
Onat Dalmaz
Kaan Gonc
H. Ankishan
E. Saritas
Ulas Bagci
H. Celik
Tolga Çukur
60
12
0
19 Jul 2022
GAFX: A General Audio Feature eXtractor
Zhaoyang Bu
Han Zhang
Xiaohu Zhu
60
0
0
19 Jul 2022
Visually-aware Acoustic Event Detection using Heterogeneous Graphs
A. Shirian
Krishna Somandepalli
Victor Sanchez
T. Guha
61
3
0
16 Jul 2022
Segment-level Metric Learning for Few-shot Bioacoustic Event Detection
Haohe Liu
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
74
8
0
15 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming-Hsuan Yang
Serge Belongie
Huayu Chen
VLM
74
22
0
15 Jul 2022
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
165
290
0
13 Jul 2022
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Nigamaa Nayakanti
Rami Al-Rfou
Aurick Zhou
Kratarth Goel
Khaled S. Refaat
Benjamin Sapp
AI4TS
140
259
0
12 Jul 2022
EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use
Jan Schluter
Gerald Gutenbrunner
VLM
60
13
0
12 Jul 2022
A Multi-tasking Model of Speaker-Keyword Classification for Keeping Human in the Loop of Drone-assisted Inspection
Yu Li
Anisha Parsan
Bill Wang
Penghao Dong
Shanshan Yao
Ruwen Qin
77
7
0
08 Jul 2022
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
Sheng Kuang
Kiki van der Heijden
S. Mehrkanoon
33
3
0
08 Jul 2022
Data Augmentation for Dementia Detection in Spoken Language
Anna Hlédiková
Dominika Woszczyk
Alican Acman
Soteris Demetriou
Björn Schuller
70
13
0
26 Jun 2022
Avoid Overfitting User Specific Information in Federated Keyword Spotting
Xin-Chun Li
Jin-Lin Tang
Shaoming Song
Bingshuai Li
Yinchuan Li
Yunfeng Shao
Le Gan
De-Chuan Zhan
FedML
AAML
64
9
0
17 Jun 2022
Event-related data conditioning for acoustic event classification
Yuanbo Hou
Dick Botteldooren
59
3
0
16 Jun 2022
It's Time for Artistic Correspondence in Music and Video
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
64
37
0
14 Jun 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
74
28
0
20 May 2022
The AI Mechanic: Acoustic Vehicle Characterization Neural Networks
Adam M. Terwilliger
J. Siegel
67
2
0
19 May 2022
Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
69
6
0
17 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
84
16
0
11 May 2022
Robustness of Neural Architectures for Audio Event Detection
Juncheng Billy Li
Zheng Wang
Shuhui Qu
Florian Metze
40
1
0
06 May 2022
Pseudo strong labels for large scale weakly supervised audio tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
63
6
0
28 Apr 2022
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Dading Chong
Helin Wang
Peilin Zhou
Qingcheng Zeng
82
68
0
27 Apr 2022
ATST: Audio Representation Learning with Teacher-Student Transformer
Xian Li
Xiaofei Li
ViT
58
22
0
26 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
100
59
0
15 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
146
43
0
06 Apr 2022
MetaAudio: A Few-Shot Audio Classification Benchmark
Calum Heggan
S. Budgett
Timothy M. Hospedales
Mehrdad Yaghoobi
VLM
86
33
0
05 Apr 2022
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
93
86
0
01 Apr 2022
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Xin Jing
Shuo Liu
Emilia Parada-Cabaleiro
Andreas Triantafyllopoulos
Meishu Song
Zijiang Yang
Björn W. Schuller
84
2
0
31 Mar 2022
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
Alan Baade
Puyuan Peng
David Harwath
84
102
0
30 Mar 2022
DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning
Sreyan Ghosh
Ashish Seth
and Deepak Mittal
Maneesh Singh
S. Umesh
SSL
64
6
0
25 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
102
9
0
25 Mar 2022
CT-SAT: Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Zhaoyi Liu
Bo Kang
Yun Wang
Dick Botteldooren
ViT
64
5
0
22 Mar 2022
PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
Samuel Yu
Peter Wu
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
LRM
120
16
0
21 Mar 2022
SepTr: Separable Transformer for Audio Spectrogram Processing
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Fahad Shahbaz Khan
ViT
96
32
0
17 Mar 2022
Learning Audio Representations with MLPs
Mashrur M. Morshed
Ahmad Omar Ahsan
H. Mahmud
Md. Kamrul Hasan
80
4
0
16 Mar 2022
Previous
1
2
3
...
10
8
9
Next