ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.01778
  4. Cited By
AST: Audio Spectrogram Transformer

AST: Audio Spectrogram Transformer

5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
    ViT
ArXivPDFHTML

Papers citing "AST: Audio Spectrogram Transformer"

50 / 463 papers shown
Title
Exploring the Potential of SSL Models for Sound Event Detection
Exploring the Potential of SSL Models for Sound Event Detection
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
2
0
0
17 May 2025
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio
Tu Duyen Nguyen
Adrien Lesage
Clotilde Cantini
Rachid Riad
30
0
0
15 May 2025
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Paul Primus
Florian Schmid
Gerhard Widmer
CLIP
AI4TS
VLM
36
0
0
12 May 2025
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Xilin Jiang
Junkai Wu
Vishal B. Choudhari
N. Mesgarani
VLM
35
0
0
11 May 2025
Learning Music Audio Representations With Limited Data
Learning Music Audio Representations With Limited Data
Christos Plachouras
Emmanouil Benetos
Johan Pauwels
31
0
0
09 May 2025
Tri-MTL: A Triple Multitask Learning Approach for Respiratory Disease Diagnosis
Tri-MTL: A Triple Multitask Learning Approach for Respiratory Disease Diagnosis
June-Woo Kim
Sanghoon Lee
Miika Toikkanen
Daehwan Hwang
Kyunghoon Kim
31
0
0
06 May 2025
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Junhe Zhang
Wanli Ni
Pengwei Wang
Dongyu Wang
29
0
0
06 May 2025
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
Soheil Zibakhsh Shabgahi
Yaman Jandali
F. Koushanfar
MoMe
AAML
57
0
0
06 May 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
81
0
0
30 Apr 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
48
0
0
30 Apr 2025
PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies
PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies
Jialiang Zhao
Naveen Kuppuswamy
S. Feng
Benjamin Burchfiel
Edward H. Adelson
37
1
0
27 Apr 2025
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation
Daniel Sliwowski
Dongheui Lee
24
1
0
25 Apr 2025
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
Daisuke Niizumi
Daiki Takeuchi
Masahiro Yasuda
Binh Thien Nguyen
Yasunori Ohishi
N. Harada
39
0
0
25 Apr 2025
Waveform-Logmel Audio Neural Networks for Respiratory Sound Classification
Waveform-Logmel Audio Neural Networks for Respiratory Sound Classification
Jiadong Xie
Yunlian Zhou
Mingsheng Xu
30
0
0
24 Apr 2025
iMedic: Towards Smartphone-based Self-Auscultation Tool for AI-Powered Pediatric Respiratory Assessment
iMedic: Towards Smartphone-based Self-Auscultation Tool for AI-Powered Pediatric Respiratory Assessment
Seung Gyu Jeong
Sung Woo Nam
Seong Kwan Jung
Seong-Eun Kim
33
0
0
22 Apr 2025
Histogram-based Parameter-efficient Tuning for Passive Sonar Classification
Histogram-based Parameter-efficient Tuning for Passive Sonar Classification
Amirmohammad Mohammadi
Davelle Carreiro
A. V. Dine
Joshua Peeples
41
0
0
21 Apr 2025
Self-Mixing Laser Interferometry: In Search of an Ambient Noise-Resilient Alternative to Acoustic Sensing
Self-Mixing Laser Interferometry: In Search of an Ambient Noise-Resilient Alternative to Acoustic Sensing
Remko Proesmans
Thomas Lips
Francis Wyffels
26
0
0
18 Apr 2025
Harmony: A Unified Framework for Modality Incremental Learning
Harmony: A Unified Framework for Modality Incremental Learning
Y. Song
Xiaoshan Yang
D. Jiang
Yaowei Wang
Changsheng Xu
CLL
50
0
0
17 Apr 2025
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
33
0
0
17 Apr 2025
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
Elisa Ancarani
Julie Tores
L. Sassatelli
Rémy Sun
Hui-Yin Wu
F. Precioso
29
0
0
15 Apr 2025
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
Junchen Fu
Yongxin Ni
J. Jose
Ioannis Arapakis
Kaiwen Zheng
Yongbin Li
Xuri Ge
34
0
0
14 Apr 2025
Generation of Musical Timbres using a Text-Guided Diffusion Model
Generation of Musical Timbres using a Text-Guided Diffusion Model
Weixuan Yuan
Qadeer Khan
Vladimir Golkov
DiffM
31
0
0
12 Apr 2025
Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities
Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities
Maria Santos-Villafranca
Dustin Carrión-Ojeda
Alejandro Pérez-Yus
J. Bermudez-Cameo
Jose J. Guerrero
Simone Schaub-Meyer
EgoV
VLM
37
0
0
11 Apr 2025
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
Wang Tang
Fethiye Irmak Dogan
Linbo Qing
Hatice Gunes
37
0
0
07 Apr 2025
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Yuto Shibata
Keitaro Tanaka
Yoshiaki Bando
Keisuke Imoto
Hirokatsu Kataoka
Yoshimitsu Aoki
31
0
0
06 Apr 2025
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
Trung Thanh Nguyen
Yasutomo Kawanishi
Vijay John
Takahiro Komamizu
Ichiro Ide
41
0
0
03 Apr 2025
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
41
0
0
03 Apr 2025
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke
Suzannah Wistreich
Yanjie Ze
Jiajun Wu
41
0
0
03 Apr 2025
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao
Junyu Luo
Zhiyuan Ning
Weizhi Zhang
Zhiping Xiao
Wei Ju
Philip S. Yu
Ming Zhang
AuLLM
49
0
0
03 Apr 2025
Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance
Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance
Taehan Lee
Hyukjun Lee
ViT
VLM
42
0
0
02 Apr 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
56
0
0
30 Mar 2025
Comparative Analysis of Image, Video, and Audio Classifiers for Automated News Video Segmentation
Comparative Analysis of Image, Video, and Audio Classifiers for Automated News Video Segmentation
Jonathan Attard
Dylan Seychell
48
0
0
27 Mar 2025
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
Suho Yoo
Hyunjong Ok
Jaeho Lee
AuLLM
RALM
51
0
0
21 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
59
0
0
20 Mar 2025
Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition
Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition
Atharva Agashe
Davelle Carreiro
A. V. Dine
Joshua Peeples
44
0
0
17 Mar 2025
Targeted Data Poisoning for Black-Box Audio Datasets Ownership Verification
Wassim Bouaziz
El-Mahdi El-Mhamdi
Nicolas Usunier
51
0
0
13 Mar 2025
R^RRFLAV: Rolling Flow matching for infinite Audio Video generation
Alex Ergasti
Giuseppe Tarollo
Filippo Botti
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
VGen
45
0
0
13 Mar 2025
Learning Gentle Grasping Using Vision, Sound, and Touch
Ken Nakahara
Roberto Calandra
58
0
0
11 Mar 2025
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim
Inyoung Jung
Dayoon Suh
Youjia Zhang
Sangmin Lee
Sungeun Hong
61
0
0
06 Mar 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Sreyan Ghosh
Zhifeng Kong
Sonal Kumar
S. Sakshi
Jaehyeon Kim
Ming-Yu Liu
Rafael Valle
Dinesh Manocha
Bryan Catanzaro
MLLM
AuLLM
LRM
59
9
0
06 Mar 2025
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
48
1
0
28 Feb 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
68
0
0
26 Feb 2025
Hedge Fund Portfolio Construction Using PolyModel Theory and iTransformer
Hedge Fund Portfolio Construction Using PolyModel Theory and iTransformer
Siqiao Zhao
Zhikang Dong
Zeyu Cao
Raphael Douady
60
6
0
17 Feb 2025
Akan Cinematic Emotions (ACE): A Multimodal Multi-party Dataset for Emotion Recognition in Movie Dialogues
Akan Cinematic Emotions (ACE): A Multimodal Multi-party Dataset for Emotion Recognition in Movie Dialogues
David Sasu
Zehui Wu
Ziwei Gong
Run Chen
Pengyuan Shi
Lin Ai
Julia Hirschberg
Natalie Schluter
60
1
0
16 Feb 2025
Harnessing Vision Models for Time Series Analysis: A Survey
Harnessing Vision Models for Time Series Analysis: A Survey
Jingchao Ni
Ziming Zhao
ChengAo Shen
Hanghang Tong
Dongjin Song
Wei Cheng
Dongsheng Luo
Haifeng Chen
AI4TS
79
1
0
13 Feb 2025
Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach
Timo Fudala
Vasileios Tsouvalas
N. Meratnia
MoE
49
0
0
10 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
72
0
0
05 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
72
1
0
28 Jan 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
99
2
0
28 Jan 2025
Safe Gradient Flow for Bilevel Optimization
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
E. Y. Hamedani
Mahyar Fazlyab
36
0
0
27 Jan 2025
1234...8910
Next