ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.01778
  4. Cited By
AST: Audio Spectrogram Transformer

AST: Audio Spectrogram Transformer

5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
    ViT
ArXivPDFHTML

Papers citing "AST: Audio Spectrogram Transformer"

50 / 463 papers shown
Title
Hybrid Losses for Hierarchical Embedding Learning
Hybrid Losses for Hierarchical Embedding Learning
Haokun Tian
Stefan Lattner
Brian McFee
Charalampos Saitis
50
0
0
22 Jan 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
41
0
0
20 Jan 2025
AudioBERT: Audio Knowledge Augmented Language Model
AudioBERT: Audio Knowledge Augmented Language Model
Hyunjong Ok
Suho Yoo
Jaeho Lee
AuLLM
RALM
VLM
53
0
0
17 Jan 2025
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Yilang Zhang
Bingcong Li
G. Giannakis
AAML
39
0
0
11 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
FAST: Fast Audio Spectrogram Transformer
Anugunj Naman
Gaibo Zhang
26
0
0
03 Jan 2025
Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation
N. Dennler
Stefanos Nikolaidis
Maja J. Matarić
174
0
0
03 Jan 2025
Trainingless Adaptation of Pretrained Models for Environmental Sound
  Classification
Trainingless Adaptation of Pretrained Models for Environmental Sound Classification
Noriyuki Tonami
Wataru Kohno
Keisuke Imoto
Yoshiyuki Yajima
Sakiko Mishima
Reishi Kondo
Tomoyuki Hino
VLM
36
0
0
23 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
79
0
0
18 Dec 2024
When Vision Models Meet Parameter Efficient Look-Aside Adapters Without
  Large-Scale Audio Pretraining
When Vision Models Meet Parameter Efficient Look-Aside Adapters Without Large-Scale Audio Pretraining
Juan Yeo
Jinkwan Jang
Kyubyung Chae
Seongkyu Mun
Taesup Kim
VLM
62
0
0
08 Dec 2024
STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied
  Agents in Minecraft
STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft
Nicholas Lenzen
Amogh Raut
Andrew Melnik
VGen
72
0
0
01 Dec 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
75
0
0
24 Nov 2024
State-Space Large Audio Language Models
State-Space Large Audio Language Models
Saurabhchand Bhati
Yuan Gong
Leonid Karlinsky
Hilde Kuehne
Rogerio Feris
James Glass
99
0
0
24 Nov 2024
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative
  Study of ChatGPT, AI Models and Human Perception
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
H. Wang
42
1
0
14 Nov 2024
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets
  for Sound Event Localization and Detection
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Fang Kang
Feiran Yang
Wenwu Wang
Mark D. Plumbley
J. Yang
33
0
0
10 Nov 2024
Model and Deep learning based Dynamic Range Compression Inversion
Model and Deep learning based Dynamic Range Compression Inversion
Haoran Sun
Dominique Fourer
Hichem Maaref
21
0
0
07 Nov 2024
Stepping Forward on the Last Mile
Stepping Forward on the Last Mile
Chen Feng
Shaojie Zhuo
Xiaopeng Zhang
R. Ramakrishnan
Zhaocong Yuan
Andrew Zou Li
46
0
0
06 Nov 2024
Angular Distance Distribution Loss for Audio Classification
Angular Distance Distribution Loss for Audio Classification
Antonio Almudévar
Romain Serizel
Alfonso Ortega
28
0
0
31 Oct 2024
EEG-based Multimodal Representation Learning for Emotion Recognition
EEG-based Multimodal Representation Learning for Emotion Recognition
Kang Yin
Hye-Bin Shin
Dan Li
Seong-Whan Lee
26
3
0
29 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging
  Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
36
0
0
24 Oct 2024
Learning to rumble: Automated elephant call classification, detection
  and endpointing using deep architectures
Learning to rumble: Automated elephant call classification, detection and endpointing using deep architectures
Christiaan M. Geldenhuys
Thomas R. Niesler
29
0
0
15 Oct 2024
GraFPrint: A GNN-Based Approach for Audio Identification
GraFPrint: A GNN-Based Approach for Audio Identification
Aditya Bhattacharjee
Shubhr Singh
Emmanouil Benetos
26
0
0
14 Oct 2024
Skipping Computations in Multimodal LLMs
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
26
2
0
12 Oct 2024
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video
  Paragraph Captioning
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Eileen Wang
Caren Han
Josiah Poon
37
0
0
12 Oct 2024
Movie Trailer Genre Classification Using Multimodal Pretrained Features
Movie Trailer Genre Classification Using Multimodal Pretrained Features
Serkan Sulun
Paula Viana
M. Davies
CLIP
18
2
0
11 Oct 2024
Music Genre Classification using Large Language Models
Music Genre Classification using Large Language Models
Mohamed El Amine Meguenani
Alceu de Souza Britto Jr.
A. L. Koerich
36
0
0
10 Oct 2024
Self-Attention Mechanism in Multimodal Context for Banking Transaction
  Flow
Self-Attention Mechanism in Multimodal Context for Banking Transaction Flow
Cyrile Delestre
Yoann Sola
24
0
0
10 Oct 2024
Audio Explanation Synthesis with Generative Foundation Models
Audio Explanation Synthesis with Generative Foundation Models
Alican Akman
Qiyang Sun
Björn W. Schuller
34
1
0
10 Oct 2024
Distilling an End-to-End Voice Assistant Without Instruction Training
  Data
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William B. Held
Ella Li
Michael Joseph Ryan
Weiyan Shi
Yanzhe Zhang
Diyi Yang
AuLLM
47
8
0
03 Oct 2024
Probabilistic road classification in historical maps using synthetic
  data and deep learning
Probabilistic road classification in historical maps using synthetic data and deep learning
Dominik J. Mühlematter
Sebastian Schweizer
Chenjing Jiao
Xue Xia
M. Heitzler
L. Hurni
34
0
0
03 Oct 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
49
2
0
02 Oct 2024
Pre-training with Synthetic Patterns for Audio
Pre-training with Synthetic Patterns for Audio
Yuchi Ishikawa
Tatsuya Komatsu
Yoshimitsu Aoki
38
0
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
14
0
01 Oct 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual
  Representation and Generation
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
39
6
0
27 Sep 2024
Prototype based Masked Audio Model for Self-Supervised Learning of Sound
  Event Detection
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
Pengfei Cai
Yan Song
Nan Jiang
Qing Gu
Ian Mcloughlin
38
2
0
26 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
29
0
0
25 Sep 2024
Generalization in birdsong classification: impact of transfer learning
  methods and dataset characteristics
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
Burooj Ghani
Vincent J. Kalkman
Bob Planqué
Willem-Pier Vellinga
L. Gill
Dan Stowell
VLM
32
5
0
21 Sep 2024
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio
  Classification
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
Dongheon Lee
Jung-Woo Choi
Mamba
29
1
0
19 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of
  Modalities
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
41
0
0
17 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities
  of Audio Language Models
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
44
4
0
17 Sep 2024
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
36
0
0
17 Sep 2024
MusicLIME: Explainable Multimodal Music Understanding
MusicLIME: Explainable Multimodal Music Understanding
Theodoros Sotirou
Vassilis Lyberatos
Orfeas Menis Mastromichalakis
Giorgos Stamou
34
2
0
16 Sep 2024
A Survey of Foundation Models for Music Understanding
A Survey of Foundation Models for Music Understanding
Wenjun Li
Ying Cai
Ziyang Wu
Wenyi Zhang
Yifan Chen
...
Junwei Han
Bao Ge
Tianming Liu
Lin Gan
Tuo Zhang
58
2
0
15 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
33
2
0
14 Sep 2024
Recent Trends of Multimodal Affective Computing: A Survey from NLP
  Perspective
Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Guimin Hu
Yi Xin
Weimin Lyu
Haojian Huang
Chang Sun
Zehan Zhu
Lin Gui
Ruichu Cai
Erik Cambria
Hasti Seifi
32
5
0
11 Sep 2024
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Hongzhi Shu
Xinglin Li
Hongyu Jiang
Minghao Fu
Xinyu Li
35
0
0
10 Sep 2024
Enhancing Temporal Understanding in Audio Question Answering for Large
  Audio Language Models
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
A. Sridhar
Yinyi Guo
Erik M. Visser
AuLLM
27
0
0
10 Sep 2024
Continuous Learning of Transformer-based Audio Deepfake Detection
Continuous Learning of Transformer-based Audio Deepfake Detection
Tuan Duy Nguyen Le
Kah Kuan Teh
Huy Dat Tran
ViT
31
2
0
09 Sep 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
32
1
0
29 Aug 2024
Towards reliable respiratory disease diagnosis based on cough sounds and
  vision transformers
Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers
Qian Wang
Zhaoyang Bu
Jiaxuan Mao
Wenyu Zhu
Jingya Zhao
Wei Du
Guochao Shi
Min Zhou
Si Chen
Jieming Qu
MedIm
41
0
0
28 Aug 2024
Previous
12345...8910
Next