Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 463 papers shown
Title
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
Meng Liu
K. Liang
Dayu Hu
Hao Yu
Yue Liu
Lingyuan Meng
Wenxuan Tu
Sihang Zhou
Xinwang Liu
18
25
0
21 Sep 2023
Exploring Meta Information for Audio-based Zero-shot Bird Classification
Alexander Gebhard
Andreas Triantafyllopoulos
Teresa Bez
Lukas Christ
Alexander Kathan
Björn W. Schuller
22
6
0
15 Sep 2023
Multilingual Audio Captioning using machine translated data
Matéo Cousin
Etienne Labbé
Thomas Pellegrini
22
4
0
14 Sep 2023
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
L. Pepino
Pablo Riera
Luciana Ferrer
38
4
0
14 Sep 2023
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Xiatian Zhu
VOS
47
5
0
13 Sep 2023
ASPED: An Audio Dataset for Detecting Pedestrians
Pavan Seshadri
Chaeyeon Han
B. Koo
Noah Posner
S. Guhathakurta
Alexander Lerch
14
2
0
12 Sep 2023
Co-learning synaptic delays, weights and adaptation in spiking neural networks
Lucas Deckers
Lauren Damme
Ing Jyh Tsang
W. V. Leekwijck
Steven Latré
27
10
0
12 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
31
5
0
10 Sep 2023
DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Guanyu Xu
Zhiwei Hao
Yong Luo
Han Hu
J. An
Shiwen Mao
ViT
39
14
0
10 Sep 2023
RoDia: A New Dataset for Romanian Dialect Identification from Speech
Codrut Rotaru
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
22
3
0
06 Sep 2023
Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
A. Sridhar
Yinyi Guo
Erik M. Visser
Rehana Mahfuz
34
5
0
06 Sep 2023
SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction
Xu Si
Xinming Wu
Hanlin Sheng
Jun Zhu
Zefeng Li
35
11
0
05 Sep 2023
LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data
Shun-Wen Hsiao
Chengbin Sun
CVBM
18
1
0
04 Sep 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
33
0
0
30 Aug 2023
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
40
19
0
28 Aug 2023
MM-AU:Towards Multimodal Understanding of Advertisement Videos
Digbalay Bose
Rajat Hebbar
Tiantian Feng
Krishna Somandepalli
Anfeng Xu
Shrikanth Narayanan
32
5
0
27 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
35
38
0
24 Aug 2023
Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning
Yuanbo Hou
Siyang Song
Cheng Luo
A. Mitchell
Qiaoqiao Ren
Weicheng Xie
Jian Kang
Wenwu Wang
Dick Botteldooren
44
6
0
23 Aug 2023
CED: Consistent ensemble distillation for audio tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
26
18
0
23 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
32
6
0
23 Aug 2023
MusicJam: Visualizing Music Insights via Generated Narrative Illustrations
Chuer Chen
Nan Cao
Jiani Hou
Yi Guo
Yulei Zhang
Yang Shi
DiffM
34
0
0
22 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
40
1
0
14 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
32
13
0
08 Aug 2023
ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data
Ruiqi Yang
Eric Modesitt
ViT
31
12
0
01 Aug 2023
Cascaded Cross-Modal Transformer for Request and Complaint Detection
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
36
3
0
27 Jul 2023
A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Li Xiao
Xiuping Yang
Xinhong Li
Weiping Tu
Xiong Chen
Weiyan Yi
Jie Lin
Yuhong Yang
Yanzhen Ren
26
2
0
25 Jul 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
46
14
0
24 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
33
137
0
20 Jul 2023
Exploring Transformer Extrapolation
Zhen Qin
Yiran Zhong
Huiyuan Deng
31
9
0
19 Jul 2023
From West to East: Who can understand the music of the others better?
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
17
4
0
19 Jul 2023
Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer
Honglin Mu
Wentian Xia
Wanxiang Che
22
1
0
19 Jul 2023
FlexiAST: Flexibility is What AST Needs
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
23
3
0
18 Jul 2023
AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023
Kin Wai Lau
Yasar Abbas Ur Rehman
Yuyang Xie
Lan Ma
13
1
0
14 Jul 2023
EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation
Jesse Choe
Siddhant Sood
Ryan Park
14
0
0
10 Jul 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Yuan Gong
Sameer Khurana
Leonid Karlinsky
James R. Glass
27
68
0
06 Jul 2023
Dataset balancing can hurt model performance
R. C. Moore
D. Ellis
Eduardo Fonseca
Shawn Hershey
A. Jansen
Manoj Plakal
27
9
0
30 Jun 2023
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Chiori Hori
Puyuan Peng
David Harwath
Xinyu Liu
Keita Ota
Siddarth Jain
Radu Corcodel
Devesh K. Jha
Diego Romeres
Jonathan Le Roux
27
4
0
27 Jun 2023
Learning Unseen Modality Interaction
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
27
3
0
22 Jun 2023
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
26
2
0
21 Jun 2023
On Frequency-Wise Normalizations for Better Recording Device Generalization in Audio Spectrogram Transformers
Paul Primus
Gerhard Widmer
22
0
0
20 Jun 2023
Multi-task Learning for Radar Signal Characterisation
Zi Huang
Akila Pemasiri
Simon Denman
Clinton Fookes
Terrence Martin
19
6
0
19 Jun 2023
Channel-Spatial-Based Few-Shot Bird Sound Event Detection
Lingwen Liu
Yuxuan Feng
Haitao Fu
Yajie Yang
Xin Pan
Chenlei Jin
23
0
0
18 Jun 2023
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks
K. Paim
Ricardo Rohweder
M. R. Mendoza
R. Mansilha
Weverton Cordeiro
25
2
0
16 Jun 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
28
1
0
12 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
24
172
0
11 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
23
26
0
07 Jun 2023
Learning Local to Global Feature Aggregation for Speech Emotion Recognition
Cheng Lu
Hailun Lian
Wenming Zheng
Yuan Zong
Yan Zhao
Sunan Li
ViT
21
7
0
02 Jun 2023
Adapting a ConvNeXt model to audio classification on AudioSet
Thomas Pellegrini
Ismail Khalfaoui-Hassani
Etienne Labbé
T. Masquelier
6
21
0
01 Jun 2023
How to Estimate Model Transferability of Pre-Trained Speech Models?
Zih-Ching Chen
Chao-Han Huck Yang
Bo-wen Li
Yu Zhang
Nanxin Chen
Shoufeng Chang
Rohit Prabhavalkar
Hung-yi Lee
Tara N. Sainath
34
9
0
01 Jun 2023
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Maxwell Horton
Sachin Mehta
Ali Farhadi
Mohammad Rastegari
VLM
22
6
0
31 May 2023
Previous
1
2
3
...
10
5
6
7
8
9
Next