ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.01778
  4. Cited By
AST: Audio Spectrogram Transformer
v1v2v3 (latest)

AST: Audio Spectrogram Transformer

5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
    ViT
ArXiv (abs)PDFHTML

Papers citing "AST: Audio Spectrogram Transformer"

50 / 486 papers shown
Title
Adversarial Fine-tuning using Generated Respiratory Sound to Address
  Class Imbalance
Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance
June-Woo Kim
Chihyeon Yoon
Miika Toikkanen
Sangmin Bae
Ho-Young Jung
DiffMMedIm
57
9
0
11 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Aaron Courville
Yiran Zhong
92
80
0
08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
95
67
0
07 Nov 2023
ATGNN: Audio Tagging Graph Neural Network
ATGNN: Audio Tagging Graph Neural Network
Shubhr Singh
Christian J. Steinmetz
Emmanouil Benetos
Huy P Phan
Dan Stowell
ViTGNN
52
9
0
02 Nov 2023
Video2Music: Suitable Music Generation from Videos using an Affective
  Multimodal Transformer model
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
Jaeyong Kang
Soujanya Poria
Dorien Herremans
MGenVGen
98
36
0
02 Nov 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner
  from Backbone
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
88
16
0
30 Oct 2023
Sound of Story: Multi-modal Storytelling with Audio
Sound of Story: Multi-modal Storytelling with Audio
Jaeyeon Bae
Seokhoon Jeong
Seokun Kang
Namgi Han
Jae-Yon Lee
Hyounghun Kim
Taehwan Kim
59
4
0
30 Oct 2023
Secure short-term load forecasting for smart grids with
  transformer-based federated learning
Secure short-term load forecasting for smart grids with transformer-based federated learning
Jonas Sievers
Thomas Blank
34
3
0
26 Oct 2023
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio
  Models
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Florian Schmid
Khaled Koutini
Gerhard Widmer
51
11
0
24 Oct 2023
AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection
AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
ViT
123
6
0
19 Oct 2023
In-Context Learning for Few-Shot Molecular Property Prediction
In-Context Learning for Few-Shot Molecular Property Prediction
Christopher Fifty
J. Leskovec
Sebastian Thrun
87
5
0
13 Oct 2023
MuseChat: A Conversational Music Recommendation System for Videos
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong
Bin Chen
Xiulong Liu
Paweł Polak
Peng Zhang
LRM
121
27
0
10 Oct 2023
Improving Discriminative Multi-Modal Learning with Large-Scale
  Pre-Trained Models
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models
Chenzhuang Du
Yue Zhao
Chonghua Liao
Jiacheng You
Jie Fu
Hang Zhao
91
2
0
08 Oct 2023
ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures
ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures
Haoxuan Liu
Vasu Singh
Michal Filipiuk
S. Hari
23
4
0
05 Oct 2023
Efficient Supervised Training of Audio Transformers for Music
  Representation Learning
Efficient Supervised Training of Audio Transformers for Music Representation Learning
Pablo Alonso-Jiménez
Xavier Serra
Dmitry Bogdanov
ViT
70
4
0
28 Sep 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio
  Tagging by Aligning with Label Text Description
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
83
0
0
28 Sep 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
133
82
0
25 Sep 2023
Audio classification with Dilated Convolution with Learnable Spacings
Audio classification with Dilated Convolution with Learnable Spacings
Ismail Khalfaoui-Hassani
T. Masquelier
Thomas Pellegrini
76
1
0
25 Sep 2023
Attention Is All You Need For Blind Room Volume Estimation
Attention Is All You Need For Blind Room Volume Estimation
Chunxiu Wang
Mao-shen Jia
Meiran Li
C. Bao
Wenyu Jin
71
7
0
23 Sep 2023
Hierarchical attention interpretation: an interpretable speech-level
  transformer for bi-modal depression detection
Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
Qing Deng
Saturnino Luz
Sofia de la Fuente Garcia
58
0
0
23 Sep 2023
Asca: less audio data is more insightful
Asca: less audio data is more insightful
Xiang Li
Jing Chen
Chao Li
Hongwu Lv
50
0
0
23 Sep 2023
Investigating Efficient Deep Learning Architectures For Side-Channel
  Attacks on AES
Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES
Yohai-Eliel Berreby
L. Sauvage
AAML
42
2
0
22 Sep 2023
Soft Merging: A Flexible and Robust Soft Model Merging Approach for
  Enhanced Neural Network Performance
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Hao Chen
Yusen Wu
Phuong Nguyen
Chao Liu
Yelena Yesha
FedMLMoMe
54
0
0
21 Sep 2023
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event
  Classification
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
Meng Liu
K. Liang
Dayu Hu
Hao Yu
Yue Liu
Lingyuan Meng
Wenxuan Tu
Sihang Zhou
Xinwang Liu
79
26
0
21 Sep 2023
Exploring Meta Information for Audio-based Zero-shot Bird Classification
Exploring Meta Information for Audio-based Zero-shot Bird Classification
Alexander Gebhard
Andreas Triantafyllopoulos
Teresa Bez
Lukas Christ
Alexander Kathan
Björn W. Schuller
97
6
0
15 Sep 2023
Multilingual Audio Captioning using machine translated data
Multilingual Audio Captioning using machine translated data
Matéo Cousin
Etienne Labbé
Thomas Pellegrini
101
4
0
14 Sep 2023
EnCodecMAE: Leveraging neural codecs for universal audio representation
  learning
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
L. Pepino
Pablo Riera
Luciana Ferrer
80
5
0
14 Sep 2023
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Xiatian Zhu
VOS
66
5
0
13 Sep 2023
ASPED: An Audio Dataset for Detecting Pedestrians
ASPED: An Audio Dataset for Detecting Pedestrians
Pavan Seshadri
Chaeyeon Han
B. Koo
Noah Posner
S. Guhathakurta
Alexander Lerch
31
2
0
12 Sep 2023
Co-learning synaptic delays, weights and adaptation in spiking neural
  networks
Co-learning synaptic delays, weights and adaptation in spiking neural networks
Lucas Deckers
Lauren Damme
Ing Jyh Tsang
W. V. Leekwijck
Steven Latré
67
13
0
12 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
86
5
0
10 Sep 2023
DeViT: Decomposing Vision Transformers for Collaborative Inference in
  Edge Devices
DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Guanyu Xu
Zhiwei Hao
Yong Luo
Han Hu
J. An
Shiwen Mao
ViT
69
16
0
10 Sep 2023
RoDia: A New Dataset for Romanian Dialect Identification from Speech
RoDia: A New Dataset for Romanian Dialect Identification from Speech
Codrut Rotaru
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
62
4
0
06 Sep 2023
Parameter Efficient Audio Captioning With Faithful Guidance Using
  Audio-text Shared Latent Representation
Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
A. Sridhar
Yinyi Guo
Erik M. Visser
Rehana Mahfuz
105
5
0
06 Sep 2023
SeisCLIP: A seismology foundation model pre-trained by multi-modal data
  for multi-purpose seismic feature extraction
SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction
Xu Si
Xinming Wu
Hanlin Sheng
Jun Zhu
Zefeng Li
66
14
0
05 Sep 2023
LoRA-like Calibration for Multimodal Deception Detection using ATSFace
  Data
LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data
Shun-Wen Hsiao
Chengbin Sun
CVBM
37
1
0
04 Sep 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
67
0
0
30 Aug 2023
Mobile Foundation Model as Firmware
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
120
22
0
28 Aug 2023
MM-AU:Towards Multimodal Understanding of Advertisement Videos
MM-AU:Towards Multimodal Understanding of Advertisement Videos
Digbalay Bose
Rajat Hebbar
Tiantian Feng
Krishna Somandepalli
Anfeng Xu
Shrikanth Narayanan
58
7
0
27 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MAAuLLM
202
39
0
24 Aug 2023
Joint Prediction of Audio Event and Annoyance Rating in an Urban
  Soundscape by Hierarchical Graph Representation Learning
Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning
Yuanbo Hou
Siyang Song
Cheng Luo
A. Mitchell
Qiaoqiao Ren
Weicheng Xie
Jian Kang
Wenwu Wang
Dick Botteldooren
73
6
0
23 Aug 2023
CED: Consistent ensemble distillation for audio tagging
CED: Consistent ensemble distillation for audio tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
83
24
0
23 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy
  Disentanglement
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
83
7
0
23 Aug 2023
MusicJam: Visualizing Music Insights via Generated Narrative
  Illustrations
MusicJam: Visualizing Music Insights via Generated Narrative Illustrations
Chuer Chen
Nan Cao
Jiani Hou
Yi Guo
Yulei Zhang
Yang Shi
DiffM
63
0
0
22 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
134
1
0
14 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large
  Audio-Caption Data Sets
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
73
13
0
08 Aug 2023
ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data
ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data
Ruiqi Yang
Eric Modesitt
ViT
90
12
0
01 Aug 2023
Cascaded Cross-Modal Transformer for Request and Complaint Detection
Cascaded Cross-Modal Transformer for Request and Complaint Detection
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
90
3
0
27 Jul 2023
A Snoring Sound Dataset for Body Position Recognition: Collection,
  Annotation, and Analysis
A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Li Xiao
Xiuping Yang
Xinhong Li
Weiping Tu
Xiong Chen
Weiyan Yi
Jie Lin
Yuhong Yang
Yanzhen Ren
61
2
0
25 Jul 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
  Alignment
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
106
18
0
24 Jul 2023
Previous
123...1056789
Next