Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
v1
v2
v3 (latest)
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 486 papers shown
Title
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
105
150
0
20 Jul 2023
Exploring Transformer Extrapolation
Zhen Qin
Yiran Zhong
Huiyuan Deng
60
9
0
19 Jul 2023
From West to East: Who can understand the music of the others better?
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
61
5
0
19 Jul 2023
Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer
Honglin Mu
Wentian Xia
Wanxiang Che
64
1
0
19 Jul 2023
FlexiAST: Flexibility is What AST Needs
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
57
3
0
18 Jul 2023
AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023
Kin Wai Lau
Yasar Abbas Ur Rehman
Yuyang Xie
Lan Ma
75
1
0
14 Jul 2023
EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation
Jesse Choe
Siddhant Sood
Ryan Park
21
0
0
10 Jul 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Yuan Gong
Sameer Khurana
Leonid Karlinsky
James R. Glass
88
71
0
06 Jul 2023
Dataset balancing can hurt model performance
R. C. Moore
D. Ellis
Eduardo Fonseca
Shawn Hershey
A. Jansen
Manoj Plakal
80
9
0
30 Jun 2023
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Chiori Hori
Puyuan Peng
David Harwath
Xinyu Liu
Keita Ota
Siddarth Jain
Radu Corcodel
Devesh K. Jha
Diego Romeres
Jonathan Le Roux
55
4
0
27 Jun 2023
Learning Unseen Modality Interaction
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
137
6
0
22 Jun 2023
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
70
2
0
21 Jun 2023
On Frequency-Wise Normalizations for Better Recording Device Generalization in Audio Spectrogram Transformers
Paul Primus
Gerhard Widmer
78
0
0
20 Jun 2023
Multi-task Learning for Radar Signal Characterisation
Zi Huang
Akila Pemasiri
Simon Denman
Clinton Fookes
Terrence Martin
56
8
0
19 Jun 2023
Channel-Spatial-Based Few-Shot Bird Sound Event Detection
Lingwen Liu
Yuxuan Feng
Haitao Fu
Yajie Yang
Xin Pan
Chenlei Jin
56
0
0
18 Jun 2023
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks
K. Paim
Ricardo Rohweder
M. R. Mendoza
R. Mansilha
Weverton Cordeiro
62
5
0
16 Jun 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
65
1
0
12 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
94
212
0
11 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
103
28
0
07 Jun 2023
Learning Local to Global Feature Aggregation for Speech Emotion Recognition
Cheng Lu
Hailun Lian
Wenming Zheng
Yuan Zong
Yan Zhao
Sunan Li
ViT
52
7
0
02 Jun 2023
Adapting a ConvNeXt model to audio classification on AudioSet
Thomas Pellegrini
Ismail Khalfaoui-Hassani
Etienne Labbé
T. Masquelier
101
23
0
01 Jun 2023
How to Estimate Model Transferability of Pre-Trained Speech Models?
Zih-Ching Chen
Chao-Han Huck Yang
Yue Liu
Yu Zhang
Nanxin Chen
Shoufeng Chang
Rohit Prabhavalkar
Hung-yi Lee
Tara N. Sainath
155
9
0
01 Jun 2023
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Maxwell Horton
Sachin Mehta
Ali Farhadi
Mohammad Rastegari
VLM
102
7
0
31 May 2023
E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks
Arshdeep Singh
Haohe Liu
Mark D. Plumbley
VLM
68
5
0
30 May 2023
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
89
4
0
29 May 2023
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
114
4
0
23 May 2023
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
Sangmin Bae
June-Woo Kim
Won-Yang Cho
Hyerim Baek
Soyoun Son
B. Lee
C. Ha
Kyongpil Tae
Sungnyun Kim
Se-Young Yun
52
36
0
23 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Minki Kang
Wooseok Han
Sung Ju Hwang
Eunho Yang
DiffM
87
19
0
23 May 2023
Towards generalizing deep-audio fake detection networks
Konstantin Gasenzer
Moritz Wolter
75
4
0
22 May 2023
ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Mireille Fares
Catherine Pelachaud
Nicolas Obin
63
0
0
22 May 2023
Listen, Think, and Understand
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
144
161
0
18 May 2023
MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
Xinyu Gong
S. Mohan
Naina Dhingra
Jean-Charles Bazin
Yilei Li
Zhangyang Wang
Rakesh Ranjan
EgoV
129
19
0
12 May 2023
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
Kai Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
88
22
0
11 May 2023
BIOT: Cross-data Biosignal Learning in the Wild
Chaoqi Yang
M. P. M. Brandon Westover
Jimeng Sun
64
10
0
10 May 2023
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
219
945
0
09 May 2023
Contrastive Speech Mixup for Low-resource Keyword Spotting
Dianwen Ng
Ruixi Zhang
J. Yip
Chong Zhang
Yukun Ma
Trung Hieu Nguyen
Chongjia Ni
Eng Siong Chng
B. Ma
93
10
0
02 May 2023
Transformer-based Sequence Labeling for Audio Classification based on MFCCs
C. Sonali
S. ChinmayiB
A. Balasubramanian
74
0
0
30 Apr 2023
MMViT: Multiscale Multiview Vision Transformers
Yuchen Liu
Natasha Ong
Kaiyan Peng
Bo Xiong
Qifan Wang
...
Madian Khabsa
Kaiyue Yang
David C. Liu
Donald Williamson
Hanchao Yu
ViT
68
4
0
28 Apr 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
60
7
0
22 Apr 2023
Denoising Cosine Similarity: A Theory-Driven Approach for Efficient Representation Learning
Takumi Nakagawa
Y. Sanada
Hiroki Waida
Yuhui Zhang
Yuichiro Wada
K. Takanashi
Tomonori Yamada
Takafumi Kanamori
DiffM
64
5
0
19 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
136
112
0
17 Apr 2023
β
β
β
-Variational autoencoders and transformers for reduced-order modelling of fluid flows
Alberto Solera-Rico
Carlos Sanmiguel Vila
Miguel Gómez-López
Yuning Wang
Abdulrahman Almashjary
Scott T. M. Dawson
Ricardo Vinuesa
DRL
109
91
0
07 Apr 2023
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Thodoris Kouzelis
Grigoris Bastas
Athanasios Katsamanis
Alexandros Potamianos
ViT
88
6
0
06 Apr 2023
Efficient CNNs via Passive Filter Pruning
Arshdeep Singh
Mark D. Plumbley
51
1
0
05 Apr 2023
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu
Xiangqing Shen
Rui Xia
121
4
0
05 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
181
220
0
30 Mar 2023
Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator
Yunhao Chen
Yunjie Zhu
Zihui Yan
Jian Shen
Zhen Ren
Yifan Huang
DiffM
85
8
0
27 Mar 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
76
1
0
24 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
63
1
0
21 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
74
31
0
20 Mar 2023
Previous
1
2
3
...
10
6
7
8
9
Next