Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
v1
v2
v3 (latest)
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 486 papers shown
Title
Audio Explanation Synthesis with Generative Foundation Models
Alican Akman
Qiyang Sun
Björn W. Schuller
75
1
0
10 Oct 2024
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William B. Held
Ella Li
Michael Joseph Ryan
Weiyan Shi
Yanzhe Zhang
Diyi Yang
AuLLM
87
16
0
03 Oct 2024
Probabilistic road classification in historical maps using synthetic data and deep learning
Dominik J. Mühlematter
Sebastian Schweizer
Chenjing Jiao
Xue Xia
M. Heitzler
L. Hurni
68
0
0
03 Oct 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
129
3
0
02 Oct 2024
Pre-training with Synthetic Patterns for Audio
Yuchi Ishikawa
Tatsuya Komatsu
Yoshimitsu Aoki
63
0
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
208
26
0
01 Oct 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
163
7
0
27 Sep 2024
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
Pengfei Cai
Yan Song
Nan Jiang
Qing Gu
Ian Mcloughlin
60
2
0
26 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
132
1
0
25 Sep 2024
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
Burooj Ghani
Vincent J. Kalkman
Bob Planqué
Willem-Pier Vellinga
L. Gill
Dan Stowell
VLM
71
6
0
21 Sep 2024
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
Dongheon Lee
Jung-Woo Choi
Mamba
61
4
0
19 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
109
0
0
17 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
126
7
0
17 Sep 2024
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
183
2
0
17 Sep 2024
MusicLIME: Explainable Multimodal Music Understanding
Theodoros Sotirou
Vassilis Lyberatos
Orfeas Menis Mastromichalakis
Giorgos Stamou
75
3
0
16 Sep 2024
A Survey of Foundation Models for Music Understanding
Wenjun Li
Ying Cai
Ziyang Wu
Wenyi Zhang
Yifan Chen
...
Junwei Han
Bao Ge
Tianming Liu
Lin Gan
Tuo Zhang
120
2
0
15 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
67
2
0
14 Sep 2024
Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Guimin Hu
Yi Xin
Weimin Lyu
Haojian Huang
Chang Sun
Zehan Zhu
Lin Gui
Ruichu Cai
Erik Cambria
Hasti Seifi
105
6
0
11 Sep 2024
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Hongzhi Shu
Xinglin Li
Hongyu Jiang
Minghao Fu
Xinyu Li
44
0
0
10 Sep 2024
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
A. Sridhar
Yinyi Guo
Erik M. Visser
AuLLM
105
0
0
10 Sep 2024
Continuous Learning of Transformer-based Audio Deepfake Detection
Tuan Duy Nguyen Le
Kah Kuan Teh
Huy Dat Tran
ViT
59
2
0
09 Sep 2024
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
113
4
0
29 Aug 2024
Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers
Qian Wang
Zhaoyang Bu
Jiaxuan Mao
Wenyu Zhu
Jingya Zhao
Wei Du
Guochao Shi
Min Zhou
Si Chen
Jieming Qu
MedIm
72
0
0
28 Aug 2024
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Md Awsafur Rahman
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Bishmoy Paul
S. Fattah
179
11
0
26 Aug 2024
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
Jingyu Liu
Minquan Wang
Ye Ma
Bo Wang
Aozhu Chen
Quan Chen
Peng Jiang
Xirong Li
133
1
0
23 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
110
0
0
11 Aug 2024
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
116
0
0
10 Aug 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun
Ruohan Gao
Ishwarya Ananthabhotla
Anurag Kumar
Jacob Donley
Chao Li
Gunhee Kim
V. Ithapu
Calvin Murdock
84
2
0
09 Aug 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
111
8
0
29 Jul 2024
PersonalityScanner: Exploring the Validity of Personality Assessment Based on Multimodal Signals in Virtual Reality
Xintong Zhang
Di Lu
Huiqi Hu
Nan Jiang
Xianhao Yu
Jinan Xu
Yujia Peng
Qing Li
Wenjuan Han
77
1
0
29 Jul 2024
I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition
Yannis Vasilakis
Rachel M. Bittner
Johan Pauwels
87
1
0
25 Jul 2024
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
Seong-Gyun Leem
Daniel Fulford
J. Onnela
David Gard
Carlos Busso
73
1
0
25 Jul 2024
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Tuomas Virtanen
Björn Schuller
100
4
0
22 Jul 2024
Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training
Ye Lin Tun
Chu Myaet Thwal
Minh N. H. Nguyen
Choong Seon Hong
87
0
0
22 Jul 2024
Integrating IP Broadcasting with Audio Tags: Workflow and Challenges
Rhys Burchett-Vass
Arshdeep Singh
Gabriel Bibbó
Mark D. Plumbley
75
0
0
22 Jul 2024
AudioInsight: Detecting Social Contexts Relevant to Social Anxiety from Speech
Varun Reddy
Zhiyuan Wang
Emma R. Toner
Max Larrazabal
M. Boukhechba
B. Teachman
Laura E. Barnes
58
4
0
19 Jul 2024
Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
Xuenan Xu
Pingyue Zhang
Ming Yan
Ji Zhang
Mengyue Wu
VLM
126
0
0
19 Jul 2024
Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons
Gauthier Boeshertz
Giacomo Indiveri
M. Nair
Alpha Renner
48
2
0
18 Jul 2024
Towards Enhanced Classification of Abnormal Lung sound in Multi-breath: A Light Weight Multi-label and Multi-head Attention Classification Method
Yi-Wei Chua
Yun-Chien Cheng
62
0
0
15 Jul 2024
Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control
Yu-Hua Chen
Yen-Tung Yeh
Yuan-Chiao Cheng
Jui-Te Wu
Yu-Hsiang Ho
J. Jang
Yi-Hsuan Yang
72
6
0
15 Jul 2024
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
77
1
0
11 Jul 2024
From Real to Cloned Singer Identification
Dorian Desblancs
Gabriel Meseguer-Brocal
Romain Hennequin
Manuel Moussallam
105
1
0
11 Jul 2024
SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness
Jie Lin
Xiuping Yang
Li Xiao
Xinhong Li
Weiyan Yi
Yuhong Yang
Weiping Tu
Xiong Chen
125
0
0
10 Jul 2024
ASGIR: Audio Spectrogram Transformer Guided Classification And Information Retrieval For Birds
Yashwardhan Chaudhuri
Paridhi Mundra
Arnesh Batra
Orchid Chetia Phukan
Arun Balaji Buduru
68
1
0
10 Jul 2024
VoxMed: One-Step Respiratory Disease Classifier using Digital Stethoscope Sounds
Paridhi Mundra
Manik Sharma
Yashwardhan Chaudhuri
Orchid Chetia Phukan
Arun Balaji Buduru
37
0
0
10 Jul 2024
Cue Point Estimation using Object Detection
Giulia Argüello
Luca A. Lanzendörfer
Roger Wattenhofer
60
1
0
09 Jul 2024
Towards Attention-based Contrastive Learning for Audio Spoof Detection
C. Goel
Surya Koppisetti
Ben Colman
Ali Shahriyari
Gaurav Bharaj
112
7
0
03 Jul 2024
Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
Hok-Shing Lau
Mark Huntly
Nathon Morgan
Adesua Iyenoma
Biao Zeng
Tim Bashford
89
1
0
29 Jun 2024
A Simple Attention-Based Mechanism for Bimodal Emotion Classification
Mazen Elabd
S. Jaf
31
0
0
28 Jun 2024
ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data
Zeyi Liu
Cheng Chi
Eric A. Cousineau
Naveen Kuppuswamy
Benjamin Burchfiel
Shuran Song
VGen
90
33
0
27 Jun 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next