Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 463 papers shown
Title
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Md Awsafur Rahman
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Bishmoy Paul
S. Fattah
46
7
0
26 Aug 2024
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
Jingyu Liu
Minquan Wang
Ye Ma
Bo Wang
Aozhu Chen
Quan Chen
Peng Jiang
Xirong Li
48
1
0
23 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
46
0
0
11 Aug 2024
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
54
0
0
10 Aug 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun
Ruohan Gao
Ishwarya Ananthabhotla
Anurag Kumar
Jacob Donley
Chao Li
Gunhee Kim
V. Ithapu
Calvin Murdock
50
2
0
09 Aug 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
48
7
0
29 Jul 2024
PersonalityScanner: Exploring the Validity of Personality Assessment Based on Multimodal Signals in Virtual Reality
Xintong Zhang
Di Lu
Huiqi Hu
Nan Jiang
Xianhao Yu
Jinan Xu
Yujia Peng
Qing Li
Wenjuan Han
36
1
0
29 Jul 2024
I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition
Yannis Vasilakis
Rachel M. Bittner
Johan Pauwels
40
0
0
25 Jul 2024
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
Seong-Gyun Leem
Daniel Fulford
J. Onnela
David Gard
Carlos Busso
39
0
0
25 Jul 2024
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Tuomas Virtanen
Björn Schuller
45
4
0
22 Jul 2024
Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training
Ye Lin Tun
Chu Myaet Thwal
Minh N. H. Nguyen
Choong Seon Hong
48
0
0
22 Jul 2024
Integrating IP Broadcasting with Audio Tags: Workflow and Challenges
Rhys Burchett-Vass
Arshdeep Singh
Gabriel Bibbó
Mark D. Plumbley
31
0
0
22 Jul 2024
AudioInsight: Detecting Social Contexts Relevant to Social Anxiety from Speech
Varun Reddy
Zhiyuan Wang
Emma R. Toner
Max Larrazabal
M. Boukhechba
B. Teachman
Laura E. Barnes
35
4
0
19 Jul 2024
Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
Xuenan Xu
Pingyue Zhang
Ming Yan
Ji Zhang
Mengyue Wu
VLM
26
0
0
19 Jul 2024
Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons
Gauthier Boeshertz
Giacomo Indiveri
M. Nair
Alpha Renner
41
2
0
18 Jul 2024
Towards Enhanced Classification of Abnormal Lung sound in Multi-breath: A Light Weight Multi-label and Multi-head Attention Classification Method
Yi-Wei Chua
Yun-Chien Cheng
24
0
0
15 Jul 2024
Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control
Yu-Hua Chen
Yen-Tung Yeh
Yuan-Chiao Cheng
Jui-Te Wu
Yu-Hsiang Ho
J. Jang
Yi-Hsuan Yang
38
5
0
15 Jul 2024
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
31
1
0
11 Jul 2024
From Real to Cloned Singer Identification
Dorian Desblancs
Gabriel Meseguer-Brocal
Romain Hennequin
Manuel Moussallam
42
0
0
11 Jul 2024
SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness
Jie Lin
Xiuping Yang
Li Xiao
Xinhong Li
Weiyan Yi
Yuhong Yang
Weiping Tu
Xiong Chen
27
0
0
10 Jul 2024
ASGIR: Audio Spectrogram Transformer Guided Classification And Information Retrieval For Birds
Yashwardhan Chaudhuri
Paridhi Mundra
Arnesh Batra
Orchid Chetia Phukan
Arun Balaji Buduru
35
1
0
10 Jul 2024
VoxMed: One-Step Respiratory Disease Classifier using Digital Stethoscope Sounds
Paridhi Mundra
Manik Sharma
Yashwardhan Chaudhuri
Orchid Chetia Phukan
Arun Balaji Buduru
30
0
0
10 Jul 2024
Cue Point Estimation using Object Detection
Giulia Argüello
Luca A. Lanzendörfer
Roger Wattenhofer
33
1
0
09 Jul 2024
Towards Attention-based Contrastive Learning for Audio Spoof Detection
C. Goel
Surya Koppisetti
Ben Colman
Ali Shahriyari
Gaurav Bharaj
60
5
0
03 Jul 2024
Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
Hok-Shing Lau
Mark Huntly
Nathon Morgan
Adesua Iyenoma
Biao Zeng
Tim Bashford
28
0
0
29 Jun 2024
A Simple Attention-Based Mechanism for Bimodal Emotion Classification
Mazen Elabd
S. Jaf
24
0
0
28 Jun 2024
ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data
Zeyi Liu
Cheng Chi
Eric A. Cousineau
Naveen Kuppuswamy
Benjamin Burchfiel
Shuran Song
VGen
44
23
0
27 Jun 2024
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Yi Zhu
Tiago H. Falk
MedIm
41
0
0
26 Jun 2024
A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons
Tzu-Yun Hung
Jui-Te Wu
Yu-Chia Kuo
Yo-Wei Hsiao
Ting-Wei Lin
Li Su
26
0
0
26 Jun 2024
Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher
Hyunjong Ok
Jegwang Ryu
Jaeho Lee
45
0
0
26 Jun 2024
This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach
Lukas Christ
Shahin Amiriparian
Friederike Hawighorst
Ann-Kathrin Schill
Angelo Boutalikakis
Lorenz Graf-Vlachy
Andreas Konig
Björn W. Schuller
24
1
0
25 Jun 2024
Sound Tagging in Infant-centric Home Soundscapes
Mohammad Nur Hossain Khan
Jialu Li
Nancy L. McElwain
M. Hasegawa-Johnson
Bashima Islam
20
0
0
25 Jun 2024
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking
Yuwei Zhang
Tong Xia
Jing Han
Yu Wu
Georgios Rizos
Yang Liu
Mohammed Mosuily
Jagmohan Chauhan
Cecilia Mascolo
AI4CE
41
6
0
23 Jun 2024
Predefined Prototypes for Intra-Class Separation and Disentanglement
Antonio Almudévar
Théo Mariotte
Alfonso Ortega
Marie Tahon
Luis Vicente
A. Miguel
Eduardo Lleida
29
0
0
23 Jun 2024
LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation
Rebecca Salganik
Xiaohao Liu
Yunshan Ma
Jian Kang
Tat-Seng Chua
CLL
46
2
0
20 Jun 2024
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Jizhong Liu
Gang Li
Junbo Zhang
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Yujun Wang
Bin Wang
AuLLM
57
2
0
19 Jun 2024
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Chandra Kiran Reddy Evuru
Utkarsh Tyagi
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
LRM
46
37
0
17 Jun 2024
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Anbai Jiang
Bing Han
Zhiqiang Lv
Yufeng Deng
Wei-Qiang Zhang
Xie Chen
Yanmin Qian
Jia Liu
Pingyi Fan
40
3
0
17 Jun 2024
AVR: Synergizing Foundation Models for Audio-Visual Humor Detection
Sarthak Sharma
Orchid Chetia Phukan
Drishti Singh
Arun Balaji Buduru
Rajesh Sharma
38
0
0
15 Jun 2024
Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors
Chaeyeon Han
Pavan Seshadri
Yiwei Ding
Noah Posner
B. Koo
Animesh Agrawal
Alexander Lerch
S. Guhathakurta
26
2
0
14 Jun 2024
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen
Puyuan Peng
Ami Baid
Zihui Xue
Wei-Ning Hsu
David Harwath
Kristen Grauman
VGen
42
8
0
13 Jun 2024
Vision Transformer Segmentation for Visual Bird Sound Denoising
Sahil Kumar
Jialu Li
Youshan Zhang
36
1
0
13 Jun 2024
Towards Multilingual Audio-Visual Question Answering
Orchid Chetia Phukan
Priyabrata Mallick
Swarup Ranjan Behera
Aalekhya Satya Narayani
Arun Balaji Buduru
Rajesh Sharma
49
0
0
13 Jun 2024
3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection
Thye Shan Ng
Feiqi Cao
S. Han
29
0
0
13 Jun 2024
Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor
Yongjie Si
Yanxiong Li
Jialong Li
Jiaxin Tan
Qianhua He
28
2
0
12 Jun 2024
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang
Shijian Deng
Jing Shi
Dimitrios Hatzinakos
Yapeng Tian
VGen
80
10
0
11 Jun 2024
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation
Swarup Ranjan Behera
Abhishek Dhiman
Karthik Gowda
Aalekhya Satya Narayani
26
1
0
11 Jun 2024
MambaLRP: Explaining Selective State Space Sequence Models
F. Jafari
G. Montavon
Klaus-Robert Müller
Oliver Eberle
Mamba
62
9
0
11 Jun 2024
BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification
June-Woo Kim
Miika Toikkanen
Yera Choi
Seoung-Eun Moon
Ho-Young Jung
44
4
0
10 Jun 2024
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition
Andreas Triantafyllopoulos
A. Batliner
Simon Rampp
M. Milling
Björn Schuller
VLM
28
0
0
10 Jun 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next