Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 463 papers shown
Title
Exploring Missing Modality in Multimodal Egocentric Datasets
Merey Ramazanova
Alejandro Pardo
Humam Alwassel
Guohao Li
EgoV
38
4
0
21 Jan 2024
ASM: Audio Spectrogram Mixer
Qingfeng Ji
Jicun Zhang
Yuxin Wang
27
1
0
20 Jan 2024
LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre Memory Units
Zeyu Liu
Gourav Datta
Anni Li
P. Beerel
35
9
0
20 Jan 2024
AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks
Yun Liang
Hai Lin
Shaojian Qiu
Yihang Zhang
21
1
0
19 Jan 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Chenyu Wang
Weixin Luo
Qianyu Chen
Haonan Mai
Jindi Guo
Sixun Dong
Xiaohua Xuan
MLLM
LLMAG
52
19
0
19 Jan 2024
From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
29
1
0
16 Jan 2024
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Nicolae-Cătălin Ristea
Andrei Anghel
Radu Tudor Ionescu
36
2
0
15 Jan 2024
Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Haobo Yue
Zhicheng Zhang
Da Mu
Yonghao Dang
Jianqin Yin
Jin Tang
35
0
0
10 Jan 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Wenxi Chen
Yuzhe Liang
Ziyang Ma
Zhisheng Zheng
Xie Chen
ViT
54
18
0
07 Jan 2024
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Haiyang Liu
Zihao Zhu
Giorgio Becherini
Yichen Peng
Mingyang Su
You Zhou
Xuefei Zhe
Naoya Iwamoto
Bo Zheng
Michael J. Black
SLR
37
29
0
31 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
40
144
0
28 Dec 2023
EnchantDance: Unveiling the Potential of Music-Driven Dance Movement
Bo Han
Yi Ren
Hao Peng
Teng Zhang
Zeyu Ling
Xiang Yin
Feilin Han
21
3
0
26 Dec 2023
Deformable Audio Transformer for Audio Event Detection
Wentao Zhu
28
0
0
24 Dec 2023
Consistent and Relevant: Rethink the Query Embedding in General Sound Separation
Yuanyuan Wang
Hangting Chen
Dongchao Yang
Jianwei Yu
Chao Weng
Zhiyong Wu
Helen M. Meng
17
6
0
24 Dec 2023
On the choice of the optimal temporal support for audio classification with Pre-trained embeddings
Aurian Quélennec
Michel Olvera
Geoffroy Peeters
S. Essid
33
2
0
21 Dec 2023
Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification
June-Woo Kim
Sangmin Bae
Won-Yang Cho
Byungjo Lee
Ho-Young Jung
47
11
0
15 Dec 2023
Efficient speech detection in environmental audio using acoustic recognition and knowledge distillation
Drew Priebe
Burooj Ghani
Dan Stowell
17
5
0
14 Dec 2023
Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI
Kai Huang
Boyuan Yang
Wei Gao
37
1
0
13 Dec 2023
Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre
Radek Danvevcek
Nikos Athanasiou
Giorgio Becherini
Christopher Peters
Michael J. Black
Timo Bolkart
DiffM
36
16
0
07 Dec 2023
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
25
17
0
27 Nov 2023
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Xiaohan Ding
Yiyuan Zhang
Yixiao Ge
Sijie Zhao
Lin Song
Xiangyu Yue
Ying Shan
VLM
AI4TS
SSL
29
102
0
27 Nov 2023
Spectro-ViT: A Vision Transformer Model for GABA-edited MRS Reconstruction Using Spectrograms
G. Dias
R. Berto
Mateus Oliveira
Lucas Ueda
S. Dertkigil
Paula D. P. Costa
Amirmohammad Shamaei
Roberto Souza
Ashley D. Harris
Letícia Rittner
16
0
0
26 Nov 2023
Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks
Amrit Nagarajan
Anand Raghunathan
VLM
ViT
23
0
0
22 Nov 2023
Unveiling the Power of Self-Attention for Shipping Cost Prediction: The Rate Card Transformer
Aditya Sreekar
Berrin Yanıko˘glu
Varun Madhavan
Abhishek Persad
12
0
0
20 Nov 2023
Multi-View Spectrogram Transformer for Respiratory Sound Classification
Wentao He
Yuchen Yan
Jianfeng Ren
Ruibin Bai
Xudong Jiang
MedIm
ViT
17
7
0
16 Nov 2023
AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance
Yuanbo Hou
Qiaoqiao Ren
Huizhong Zhang
A. Mitchell
F. Aletta
Jian Kang
Dick Botteldooren
38
14
0
15 Nov 2023
Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance
June-Woo Kim
Chihyeon Yoon
Miika Toikkanen
Sangmin Bae
Ho-Young Jung
DiffM
MedIm
22
7
0
11 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Aaron Courville
Yiran Zhong
36
74
0
08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
34
64
0
07 Nov 2023
ATGNN: Audio Tagging Graph Neural Network
Shubhr Singh
Christian J. Steinmetz
Emmanouil Benetos
Huy P Phan
Dan Stowell
ViT
GNN
22
8
0
02 Nov 2023
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
Jaeyong Kang
Soujanya Poria
Dorien Herremans
MGen
VGen
19
32
0
02 Nov 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
35
15
0
30 Oct 2023
Sound of Story: Multi-modal Storytelling with Audio
Jaeyeon Bae
Seokhoon Jeong
Seokun Kang
Namgi Han
Jae-Yon Lee
Hyounghun Kim
Taehwan Kim
26
2
0
30 Oct 2023
Secure short-term load forecasting for smart grids with transformer-based federated learning
Jonas Sievers
Thomas Blank
16
3
0
26 Oct 2023
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Florian Schmid
Khaled Koutini
Gerhard Widmer
18
11
0
24 Oct 2023
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
ViT
53
6
0
19 Oct 2023
In-Context Learning for Few-Shot Molecular Property Prediction
Christopher Fifty
J. Leskovec
Sebastian Thrun
36
5
0
13 Oct 2023
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong
Bin Chen
Xiulong Liu
Paweł Polak
Peng Zhang
LRM
45
26
0
10 Oct 2023
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models
Chenzhuang Du
Yue Zhao
Chonghua Liao
Jiacheng You
Jie Fu
Hang Zhao
44
2
0
08 Oct 2023
ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures
Haoxuan Liu
Vasu Singh
Michal Filipiuk
S. Hari
8
4
0
05 Oct 2023
Efficient Supervised Training of Audio Transformers for Music Representation Learning
Pablo Alonso-Jiménez
Xavier Serra
Dmitry Bogdanov
ViT
35
3
0
28 Sep 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
34
0
0
28 Sep 2023
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
28
69
0
25 Sep 2023
Audio classification with Dilated Convolution with Learnable Spacings
Ismail Khalfaoui-Hassani
T. Masquelier
Thomas Pellegrini
25
1
0
25 Sep 2023
Attention Is All You Need For Blind Room Volume Estimation
Chunxiu Wang
Mao-shen Jia
Meiran Li
C. Bao
Wenyu Jin
36
7
0
23 Sep 2023
Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
Qing Deng
Saturnino Luz
Sofia de la Fuente Garcia
20
0
0
23 Sep 2023
Asca: less audio data is more insightful
Xiang Li
Jing Chen
Chao Li
Hongwu Lv
20
0
0
23 Sep 2023
Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES
Yohai-Eliel Berreby
L. Sauvage
AAML
15
2
0
22 Sep 2023
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Hao Chen
Yusen Wu
Phuong Nguyen
Chao Liu
Yelena Yesha
FedML
MoMe
24
0
0
21 Sep 2023
Previous
1
2
3
4
5
6
...
8
9
10
Next