ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.01778
  4. Cited By
AST: Audio Spectrogram Transformer
v1v2v3 (latest)

AST: Audio Spectrogram Transformer

5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
    ViT
ArXiv (abs)PDFHTML

Papers citing "AST: Audio Spectrogram Transformer"

50 / 486 papers shown
Title
Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
34
0
0
20 Jun 2025
Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation
Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation
Carolina Higuera
Akash Sharma
Taosha Fan
Chaithanya Krishna Bodduluri
Byron Boots
...
Mike Lambeta
Tingfan Wu
Zixi Liu
Francois Robert Hogan
Mustafa Mukadam
34
0
0
17 Jun 2025
The Perception of Phase Intercept Distortion and its Application in Data Augmentation
The Perception of Phase Intercept Distortion and its Application in Data Augmentation
Venkatakrishnan Vaidyanathapuram Krishnan
Nathaniel Condit-Schultz
26
0
0
17 Jun 2025
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex
S. Ahmed
A. Mustafa
Muhammad Awais
Philip J. B. Jackson
31
1
0
13 Jun 2025
PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series Segmentation
PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series Segmentation
Ching Chang
Ming-Chih Lo
Wen-Chih Peng
Tien-Fu Chen
AI4TS
54
0
0
12 Jun 2025
Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages
Rishabh Ranjan
Likhith Ayinala
Mayank Vatsa
Richa Singh
19
0
0
10 Jun 2025
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
VOS
33
0
0
09 Jun 2025
SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
Rishabh Ranjan
Kishan Pipariya
Mayank Vatsa
Richa Singh
30
0
0
07 Jun 2025
Benchmarking Time-localized Explanations for Audio Classification Models
Benchmarking Time-localized Explanations for Audio Classification Models
Cecilia Bolaños
L. Pepino
Martin Meza
Luciana Ferrer
41
0
0
04 Jun 2025
CogniAlign: Word-Level Multimodal Speech Alignment with Gated Cross-Attention for Alzheimer's Detection
CogniAlign: Word-Level Multimodal Speech Alignment with Gated Cross-Attention for Alzheimer's Detection
David Ortiz-Perez
Manuel Benavent-Lledo
Javier Rodriguez-Juan
José García Rodríguez
David Tomás
79
0
0
02 Jun 2025
General-purpose audio representation learning for real-world sound scenes
General-purpose audio representation learning for real-world sound scenes
Goksenin Yuksel
Marcel van Gerven
Kiki van der Heijden
39
0
0
01 Jun 2025
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
Ioan-Paul Ciobanu
Andrei Iulian Hiji
Nicolae-Cătălin Ristea
Paul Irofti
Cristian Rusu
Radu Tudor Ionescu
44
0
0
31 May 2025
Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds
Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds
Andrew Chang
Yike Li
Iran R. Roman
David Poeppel
54
0
0
29 May 2025
Patient Domain Supervised Contrastive Learning for Lung Sound Classification Using Mobile Phone
Patient Domain Supervised Contrastive Learning for Lung Sound Classification Using Mobile Phone
Seung Gyu Jeong
Seong-Eun Kim
24
0
0
29 May 2025
Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation Learning
Xiaofeng Pan
Jing Chen
Haitong Zhang
Menglin Xing
Jiayi Wei
Xuefeng Mu
Zhongqian Xie
40
0
0
29 May 2025
Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles
Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles
Miika Toikkanen
June-Woo Kim
59
0
0
28 May 2025
Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses
Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses
Seung Gyu Jeong
Seong-Eun Kim
OOD
35
0
0
28 May 2025
Hybrid Audio Detection Using Fine-Tuned Audio Spectrogram Transformers: A Dataset-Driven Evaluation of Mixed AI-Human Speech
Hybrid Audio Detection Using Fine-Tuned Audio Spectrogram Transformers: A Dataset-Driven Evaluation of Mixed AI-Human Speech
Kunyang Huang
Bin Hu
54
0
0
21 May 2025
15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks
15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks
Andrew P. Berg
Qian Zhang
Mia Y. Wang
37
0
0
21 May 2025
Large Language Models Implicitly Learn to See and Hear Just By Reading
Prateek Verma
Mert Pilanci
200
0
0
20 May 2025
Picturized and Recited with Dialects: A Multimodal Chinese Representation Framework for Sentiment Analysis of Classical Chinese Poetry
Picturized and Recited with Dialects: A Multimodal Chinese Representation Framework for Sentiment Analysis of Classical Chinese Poetry
Xiaocong Du
Haoyu Pei
Haipeng Zhang
65
0
0
19 May 2025
Exploring the Potential of SSL Models for Sound Event Detection
Exploring the Potential of SSL Models for Sound Event Detection
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
96
0
0
17 May 2025
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio
Tu Duyen Nguyen
Adrien Lesage
Clotilde Cantini
Rachid Riad
118
0
0
15 May 2025
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Paul Primus
Florian Schmid
Gerhard Widmer
CLIPAI4TSVLM
60
0
0
12 May 2025
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Xilin Jiang
Junkai Wu
Vishal B. Choudhari
N. Mesgarani
VLM
84
0
0
11 May 2025
Learning Music Audio Representations With Limited Data
Learning Music Audio Representations With Limited Data
Christos Plachouras
Emmanouil Benetos
Johan Pauwels
99
0
0
09 May 2025
Tri-MTL: A Triple Multitask Learning Approach for Respiratory Disease Diagnosis
Tri-MTL: A Triple Multitask Learning Approach for Respiratory Disease Diagnosis
June-Woo Kim
Sanghoon Lee
Miika Toikkanen
Daehwan Hwang
Kyunghoon Kim
101
0
0
06 May 2025
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Junhe Zhang
Wanli Ni
Pengwei Wang
Dongyu Wang
80
0
0
06 May 2025
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
Soheil Zibakhsh Shabgahi
Yaman Jandali
F. Koushanfar
MoMeAAML
106
0
0
06 May 2025
Probing Audio-Generation Capabilities of Text-Based Language Models
Probing Audio-Generation Capabilities of Text-Based Language Models
Arjun Prasaath Anbazhagan
Parteek Kumar
Ujjwal Kaur
Aslihan Akalin
Kevin Zhu
Sean O'Brien
AuLLM
27
1
0
04 May 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIPVLM
243
0
0
30 Apr 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOSVLM
107
0
0
30 Apr 2025
PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies
PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies
Jialiang Zhao
Naveen Kuppuswamy
S. Feng
Benjamin Burchfiel
Edward H. Adelson
122
1
0
27 Apr 2025
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation
Daniel Sliwowski
Dongheui Lee
66
1
0
25 Apr 2025
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
Daisuke Niizumi
Daiki Takeuchi
Masahiro Yasuda
Binh Thien Nguyen
Yasunori Ohishi
Noboru Harada
94
0
0
25 Apr 2025
Waveform-Logmel Audio Neural Networks for Respiratory Sound Classification
Waveform-Logmel Audio Neural Networks for Respiratory Sound Classification
Jiadong Xie
Yunlian Zhou
Mingsheng Xu
69
0
0
24 Apr 2025
iMedic: Towards Smartphone-based Self-Auscultation Tool for AI-Powered Pediatric Respiratory Assessment
iMedic: Towards Smartphone-based Self-Auscultation Tool for AI-Powered Pediatric Respiratory Assessment
Seung Gyu Jeong
Sung Woo Nam
Seong Kwan Jung
Seong-Eun Kim
189
1
0
22 Apr 2025
Histogram-based Parameter-efficient Tuning for Passive Sonar Classification
Histogram-based Parameter-efficient Tuning for Passive Sonar Classification
Amirmohammad Mohammadi
Davelle Carreiro
A. V. Dine
Joshua Peeples
104
0
0
21 Apr 2025
Self-Mixing Laser Interferometry: In Search of an Ambient Noise-Resilient Alternative to Acoustic Sensing
Self-Mixing Laser Interferometry: In Search of an Ambient Noise-Resilient Alternative to Acoustic Sensing
Remko Proesmans
Thomas Lips
Francis Wyffels
55
0
0
18 Apr 2025
Harmony: A Unified Framework for Modality Incremental Learning
Harmony: A Unified Framework for Modality Incremental Learning
Y. Song
Xiaoshan Yang
D. Jiang
Yaowei Wang
Changsheng Xu
CLL
163
0
0
17 Apr 2025
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
56
0
0
17 Apr 2025
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
Elisa Ancarani
Julie Tores
L. Sassatelli
Rémy Sun
Hui-Yin Wu
F. Precioso
89
0
0
15 Apr 2025
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
Junchen Fu
Yongxin Ni
J. Jose
Ioannis Arapakis
Kaiwen Zheng
You Li
Xuri Ge
74
0
0
14 Apr 2025
Generation of Musical Timbres using a Text-Guided Diffusion Model
Generation of Musical Timbres using a Text-Guided Diffusion Model
Weixuan Yuan
Qadeer Khan
Vladimir Golkov
DiffM
114
0
0
12 Apr 2025
Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities
Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities
Maria Santos-Villafranca
Dustin Carrión-Ojeda
Alejandro Pérez-Yus
J. Bermudez-Cameo
Jose J. Guerrero
Simone Schaub-Meyer
EgoVVLM
82
0
0
11 Apr 2025
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
Wang Tang
Fethiye Irmak Dogan
Linbo Qing
Hatice Gunes
71
0
0
07 Apr 2025
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Yuto Shibata
Keitaro Tanaka
Yoshiaki Bando
Keisuke Imoto
Hirokatsu Kataoka
Yoshimitsu Aoki
68
0
0
06 Apr 2025
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke
Suzannah Wistreich
Yanjie Ze
Jiajun Wu
74
0
0
03 Apr 2025
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
Trung Thanh Nguyen
Yasutomo Kawanishi
Vijay John
Takahiro Komamizu
Ichiro Ide
119
0
0
03 Apr 2025
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao
Junyu Luo
Zhiyuan Ning
Weizhi Zhang
Zhiping Xiao
Wei Ju
Philip S. Yu
Ming Zhang
AuLLM
111
0
0
03 Apr 2025
1234...8910
Next