Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 464 papers shown
Title
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Maxwell Horton
Sachin Mehta
Ali Farhadi
Mohammad Rastegari
VLM
22
6
0
31 May 2023
E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks
Arshdeep Singh
Haohe Liu
Mark D. Plumbley
VLM
30
5
0
30 May 2023
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
34
4
0
29 May 2023
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
59
3
0
23 May 2023
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
Sangmin Bae
June-Woo Kim
Won-Yang Cho
Hyerim Baek
Soyoun Son
B. Lee
C. Ha
Kyongpil Tae
Sungnyun Kim
Se-Young Yun
20
29
0
23 May 2023
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Minki Kang
Wooseok Han
Sung Ju Hwang
Eunho Yang
DiffM
30
18
0
23 May 2023
Towards generalizing deep-audio fake detection networks
Konstantin Gasenzer
Moritz Wolter
36
4
0
22 May 2023
ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Mireille Fares
Catherine Pelachaud
Nicolas Obin
22
0
0
22 May 2023
Listen, Think, and Understand
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
43
137
0
18 May 2023
MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
Xinyu Gong
S. Mohan
Naina Dhingra
Jean-Charles Bazin
Yilei Li
Zhangyang Wang
Rakesh Ranjan
EgoV
56
17
0
12 May 2023
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
K. Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
18
17
0
11 May 2023
BIOT: Cross-data Biosignal Learning in the Wild
Chaoqi Yang
M. P. M. Brandon Westover
Jimeng Sun
18
9
0
10 May 2023
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
44
855
0
09 May 2023
Contrastive Speech Mixup for Low-resource Keyword Spotting
Dianwen Ng
Ruixi Zhang
J. Yip
Chong Zhang
Yukun Ma
Trung Hieu Nguyen
Chongjia Ni
Eng Siong Chng
B. Ma
38
10
0
02 May 2023
Transformer-based Sequence Labeling for Audio Classification based on MFCCs
C. Sonali
S. ChinmayiB
A. Balasubramanian
34
0
0
30 Apr 2023
MMViT: Multiscale Multiview Vision Transformers
Yuchen Liu
Natasha Ong
Kaiyan Peng
Bo Xiong
Qifan Wang
...
Madian Khabsa
Kaiyue Yang
David C. Liu
Donald Williamson
Hanchao Yu
ViT
33
4
0
28 Apr 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
28
6
0
22 Apr 2023
Denoising Cosine Similarity: A Theory-Driven Approach for Efficient Representation Learning
Takumi Nakagawa
Y. Sanada
Hiroki Waida
Yuhui Zhang
Yuichiro Wada
K. Takanashi
Tomonori Yamada
Takafumi Kanamori
DiffM
19
5
0
19 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
34
103
0
17 Apr 2023
β
β
β
-Variational autoencoders and transformers for reduced-order modelling of fluid flows
Alberto Solera-Rico
Carlos Sanmiguel Vila
Miguel Gómez-López
Yuning Wang
Abdulrahman Almashjary
Scott T. M. Dawson
Ricardo Vinuesa
DRL
16
74
0
07 Apr 2023
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Thodoris Kouzelis
Grigoris Bastas
Athanasios Katsamanis
Alexandros Potamianos
ViT
30
6
0
06 Apr 2023
Efficient CNNs via Passive Filter Pruning
Arshdeep Singh
Mark D. Plumbley
24
1
0
05 Apr 2023
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu
Xiangqing Shen
Rui Xia
26
5
0
05 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
55
194
0
30 Mar 2023
Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator
Yunhao Chen
Yunjie Zhu
Zihui Yan
Jian Shen
Zhen Ren
Yifan Huang
DiffM
39
8
0
27 Mar 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
27
1
0
24 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
40
1
0
21 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
32
29
0
20 Mar 2023
DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification
Yangfu Li
Jiapan Gan
Xiaodan Lin
24
6
0
20 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
37
22
0
19 Mar 2023
Weight-sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Constraints
Guan-Ting Lin
Qingming Tang
Chieh-Chi Kao
Viktor Rozgic
Chao Wang
28
0
0
18 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
71
14
0
14 Mar 2023
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu
Hanlin Lu
Jianbo Yuan
Xinyu Li
ViT
28
22
0
14 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
24
10
0
12 Mar 2023
AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer
Kang Li
Yan Song
Lirong Dai
Ian Mcloughlin
Xin Fang
Lin Liu
32
22
0
07 Mar 2023
Heterogeneous Graph Learning for Acoustic Event Classification
A. Shirian
Mona Ahmadian
Krishna Somandepalli
T. Guha
30
2
0
05 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
39
13
0
04 Mar 2023
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
24
4
0
03 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
41
4
0
03 Mar 2023
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
Nithish Muthuchamy Selvaraj
Xiaobao Guo
A. Kong
Bingquan Shen
Alex C. Kot
CLL
25
8
0
28 Feb 2023
Improving Speech Enhancement via Event-based Query
Yifei Xin
Xiulian Peng
Yan Lu
34
6
0
20 Feb 2023
A dataset for Audio-Visual Sound Event Detection in Movies
Rajat Hebbar
Digbalay Bose
Krishna Somandepalli
Veena Vijai
Shrikanth Narayanan
6
8
0
14 Feb 2023
SemanticAC: Semantics-Assisted Framework for Audio Classification
Yicheng Xiao
Yue Ma
Shuyan Li
Hantao Zhou
Ran Liao
Xiu Li
13
8
0
12 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
39
1
0
07 Feb 2023
Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data
Jang-Hyun Kim
Sangdoo Yun
Hyun Oh Song
34
18
0
29 Jan 2023
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
39
20
0
23 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
26
15
0
19 Jan 2023
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
21
5
0
06 Jan 2023
Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks
Zifan Jiang
A. Soldati
Isaac Schamberg
A. R. Lameira
Steven Moran
21
6
0
05 Jan 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
41
258
0
18 Dec 2022
Previous
1
2
3
...
10
6
7
8
9
Next