Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
v1
v2
v3 (latest)
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 486 papers shown
Title
Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
34
0
0
20 Jun 2025
Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation
Carolina Higuera
Akash Sharma
Taosha Fan
Chaithanya Krishna Bodduluri
Byron Boots
...
Mike Lambeta
Tingfan Wu
Zixi Liu
Francois Robert Hogan
Mustafa Mukadam
34
0
0
17 Jun 2025
The Perception of Phase Intercept Distortion and its Application in Data Augmentation
Venkatakrishnan Vaidyanathapuram Krishnan
Nathaniel Condit-Schultz
26
0
0
17 Jun 2025
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex
S. Ahmed
A. Mustafa
Muhammad Awais
Philip J. B. Jackson
31
1
0
13 Jun 2025
PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series Segmentation
Ching Chang
Ming-Chih Lo
Wen-Chih Peng
Tien-Fu Chen
AI4TS
54
0
0
12 Jun 2025
Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages
Rishabh Ranjan
Likhith Ayinala
Mayank Vatsa
Richa Singh
19
0
0
10 Jun 2025
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
VOS
33
0
0
09 Jun 2025
SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
Rishabh Ranjan
Kishan Pipariya
Mayank Vatsa
Richa Singh
30
0
0
07 Jun 2025
Benchmarking Time-localized Explanations for Audio Classification Models
Cecilia Bolaños
L. Pepino
Martin Meza
Luciana Ferrer
41
0
0
04 Jun 2025
CogniAlign: Word-Level Multimodal Speech Alignment with Gated Cross-Attention for Alzheimer's Detection
David Ortiz-Perez
Manuel Benavent-Lledo
Javier Rodriguez-Juan
José García Rodríguez
David Tomás
79
0
0
02 Jun 2025
General-purpose audio representation learning for real-world sound scenes
Goksenin Yuksel
Marcel van Gerven
Kiki van der Heijden
39
0
0
01 Jun 2025
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
Ioan-Paul Ciobanu
Andrei Iulian Hiji
Nicolae-Cătălin Ristea
Paul Irofti
Cristian Rusu
Radu Tudor Ionescu
44
0
0
31 May 2025
Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds
Andrew Chang
Yike Li
Iran R. Roman
David Poeppel
54
0
0
29 May 2025
Patient Domain Supervised Contrastive Learning for Lung Sound Classification Using Mobile Phone
Seung Gyu Jeong
Seong-Eun Kim
24
0
0
29 May 2025
Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation Learning
Xiaofeng Pan
Jing Chen
Haitong Zhang
Menglin Xing
Jiayi Wei
Xuefeng Mu
Zhongqian Xie
40
0
0
29 May 2025
Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles
Miika Toikkanen
June-Woo Kim
59
0
0
28 May 2025
Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses
Seung Gyu Jeong
Seong-Eun Kim
OOD
35
0
0
28 May 2025
Hybrid Audio Detection Using Fine-Tuned Audio Spectrogram Transformers: A Dataset-Driven Evaluation of Mixed AI-Human Speech
Kunyang Huang
Bin Hu
54
0
0
21 May 2025
15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks
Andrew P. Berg
Qian Zhang
Mia Y. Wang
37
0
0
21 May 2025
Large Language Models Implicitly Learn to See and Hear Just By Reading
Prateek Verma
Mert Pilanci
200
0
0
20 May 2025
Picturized and Recited with Dialects: A Multimodal Chinese Representation Framework for Sentiment Analysis of Classical Chinese Poetry
Xiaocong Du
Haoyu Pei
Haipeng Zhang
65
0
0
19 May 2025
Exploring the Potential of SSL Models for Sound Event Detection
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
96
0
0
17 May 2025
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio
Tu Duyen Nguyen
Adrien Lesage
Clotilde Cantini
Rachid Riad
118
0
0
15 May 2025
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Paul Primus
Florian Schmid
Gerhard Widmer
CLIP
AI4TS
VLM
60
0
0
12 May 2025
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Xilin Jiang
Junkai Wu
Vishal B. Choudhari
N. Mesgarani
VLM
84
0
0
11 May 2025
Learning Music Audio Representations With Limited Data
Christos Plachouras
Emmanouil Benetos
Johan Pauwels
99
0
0
09 May 2025
Tri-MTL: A Triple Multitask Learning Approach for Respiratory Disease Diagnosis
June-Woo Kim
Sanghoon Lee
Miika Toikkanen
Daehwan Hwang
Kyunghoon Kim
101
0
0
06 May 2025
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Junhe Zhang
Wanli Ni
Pengwei Wang
Dongyu Wang
80
0
0
06 May 2025
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
Soheil Zibakhsh Shabgahi
Yaman Jandali
F. Koushanfar
MoMe
AAML
106
0
0
06 May 2025
Probing Audio-Generation Capabilities of Text-Based Language Models
Arjun Prasaath Anbazhagan
Parteek Kumar
Ujjwal Kaur
Aslihan Akalin
Kevin Zhu
Sean O'Brien
AuLLM
27
1
0
04 May 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
243
0
0
30 Apr 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
107
0
0
30 Apr 2025
PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies
Jialiang Zhao
Naveen Kuppuswamy
S. Feng
Benjamin Burchfiel
Edward H. Adelson
122
1
0
27 Apr 2025
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation
Daniel Sliwowski
Dongheui Lee
66
1
0
25 Apr 2025
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
Daisuke Niizumi
Daiki Takeuchi
Masahiro Yasuda
Binh Thien Nguyen
Yasunori Ohishi
Noboru Harada
94
0
0
25 Apr 2025
Waveform-Logmel Audio Neural Networks for Respiratory Sound Classification
Jiadong Xie
Yunlian Zhou
Mingsheng Xu
69
0
0
24 Apr 2025
iMedic: Towards Smartphone-based Self-Auscultation Tool for AI-Powered Pediatric Respiratory Assessment
Seung Gyu Jeong
Sung Woo Nam
Seong Kwan Jung
Seong-Eun Kim
189
1
0
22 Apr 2025
Histogram-based Parameter-efficient Tuning for Passive Sonar Classification
Amirmohammad Mohammadi
Davelle Carreiro
A. V. Dine
Joshua Peeples
104
0
0
21 Apr 2025
Self-Mixing Laser Interferometry: In Search of an Ambient Noise-Resilient Alternative to Acoustic Sensing
Remko Proesmans
Thomas Lips
Francis Wyffels
55
0
0
18 Apr 2025
Harmony: A Unified Framework for Modality Incremental Learning
Y. Song
Xiaoshan Yang
D. Jiang
Yaowei Wang
Changsheng Xu
CLL
163
0
0
17 Apr 2025
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
56
0
0
17 Apr 2025
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
Elisa Ancarani
Julie Tores
L. Sassatelli
Rémy Sun
Hui-Yin Wu
F. Precioso
89
0
0
15 Apr 2025
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
Junchen Fu
Yongxin Ni
J. Jose
Ioannis Arapakis
Kaiwen Zheng
You Li
Xuri Ge
74
0
0
14 Apr 2025
Generation of Musical Timbres using a Text-Guided Diffusion Model
Weixuan Yuan
Qadeer Khan
Vladimir Golkov
DiffM
114
0
0
12 Apr 2025
Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities
Maria Santos-Villafranca
Dustin Carrión-Ojeda
Alejandro Pérez-Yus
J. Bermudez-Cameo
Jose J. Guerrero
Simone Schaub-Meyer
EgoV
VLM
82
0
0
11 Apr 2025
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
Wang Tang
Fethiye Irmak Dogan
Linbo Qing
Hatice Gunes
71
0
0
07 Apr 2025
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Yuto Shibata
Keitaro Tanaka
Yoshiaki Bando
Keisuke Imoto
Hirokatsu Kataoka
Yoshimitsu Aoki
68
0
0
06 Apr 2025
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke
Suzannah Wistreich
Yanjie Ze
Jiajun Wu
74
0
0
03 Apr 2025
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
Trung Thanh Nguyen
Yasutomo Kawanishi
Vijay John
Takahiro Komamizu
Ichiro Ide
119
0
0
03 Apr 2025
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao
Junyu Luo
Zhiyuan Ning
Weizhi Zhang
Zhiping Xiao
Wei Ju
Philip S. Yu
Ming Zhang
AuLLM
111
0
0
03 Apr 2025
1
2
3
4
...
8
9
10
Next