Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 463 papers shown
Title
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
26
51
0
15 Dec 2022
CLIPPO: Image-and-Language Understanding from Pixels Only
Michael Tschannen
Basil Mustafa
N. Houlsby
CLIP
VLM
32
47
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
34
73
0
15 Dec 2022
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
37
43
0
09 Dec 2022
Federated Learning for Inference at Anytime and Anywhere
Zicheng Liu
Da Li
Javier Fernandez-Marques
Stefanos Laskaridis
Yan Gao
L. Dudziak
Stan Z. Li
S. Hu
Timothy M. Hospedales
FedML
32
5
0
08 Dec 2022
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
Ronghui Li
Junfan Zhao
Yachao Zhang
Mingyang Su
Zeping Ren
Han Zhang
Yansong Tang
Xiuhua Li
DiffM
30
51
0
07 Dec 2022
Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers
Khaled Koutini
Shahed Masoudian
Florian Schmid
Hamid Eghbalzadeh
Jan Schluter
Gerhard Widmer
19
5
0
25 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
19
20
0
25 Nov 2022
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
Sara Atito
Muhammad Awais
Wenwu Wang
Mark D. Plumbley
J. Kittler
ViT
18
9
0
23 Nov 2022
Ontology-aware Learning and Evaluation for Audio Tagging
Haohe Liu
Qiuqiang Kong
Xubo Liu
Xinhao Mei
Wenwu Wang
Mark D. Plumbley
22
4
0
22 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Z. Yao
Xiaoxia Wu
Conglong Li
Connor Holmes
Minjia Zhang
Cheng-rong Li
Yuxiong He
28
11
0
17 Nov 2022
Music Instrument Classification Reprogrammed
Hsin-Hung Chen
Alexander Lerch
24
4
0
15 Nov 2022
The Birds Need Attention Too: Analysing usage of Self Attention in identifying bird calls in soundscapes
Chandra Kanth Nagesh
Abhishek Purushothama
24
2
0
14 Nov 2022
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Florian Schmid
Khaled Koutini
Gerhard Widmer
ViT
28
58
0
09 Nov 2022
Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block
Yunhao Chen
Yunjie Zhu
Zihui Yan
Yifan Huang
Zhen Ren
Jianlu Shen
Lifang Chen
28
9
0
05 Nov 2022
Integrated Parameter-Efficient Tuning for General-Purpose Audio Models
Ju-ho Kim
Ju-Sung Heo
Hyun-Seo Shin
Chanmann Lim
Ha-Jin Yu
26
5
0
04 Nov 2022
MAST: Multiscale Audio Spectrogram Transformers
Sreyan Ghosh
Ashish Seth
S. Umesh
Tianyi Zhou
22
3
0
02 Nov 2022
Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming
Yun-Ning Hung
Chao-Han Huck Yang
Pin-Yu Chen
Alexander Lerch
27
17
0
02 Nov 2022
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
40
30
0
26 Oct 2022
Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19
M. Gauy
Marcelo Finger
24
7
0
25 Oct 2022
GCT: Gated Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Yun Wang
Wenwu Wang
Dick Botteldooren
33
0
0
22 Oct 2022
Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system
Francesca Ronchini
Samuele Cornell
Romain Serizel
Nicolas Turpault
Eduardo Fonseca
D. Ellis
30
14
0
14 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Emily R. Bartusiak
Edward J. Delp
27
12
0
14 Oct 2022
Supervised and Unsupervised Learning of Audio Representations for Music Understanding
Matthew C. McCallum
Filip Korzeniowski
Sergio Oramas
F. Gouyon
Andreas F. Ehmann
SSL
80
36
0
07 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Yangfu Li
Xiaodan Lin
Jiaxin Yang
19
0
0
06 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
37
120
0
02 Oct 2022
An empirical study of weakly supervised audio tagging embeddings for general audio representations
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
43
1
0
30 Sep 2022
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
124
50
0
28 Sep 2022
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
51
28
0
28 Sep 2022
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference
Mu Yuan
Lan Zhang
Fengxiang He
Xueting Tong
Miao-Hui Song
Zhengyuan Xu
Xiang-Yang Li
32
2
0
28 Sep 2022
UniKW-AT: Unified Keyword Spotting and Audio Tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
45
3
0
23 Sep 2022
An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition
Yang Wu
Pai Peng
Zhenyu Zhang
Yanyan Zhao
Bing Qin
27
1
0
20 Sep 2022
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization
Dianwen Ng
J. Yip
Tanmay Surana
Zhao Yang
Chong Zhang
Yukun Ma
Chongjia Ni
Chng Eng Siong
B. Ma
35
6
0
14 Sep 2022
Classify Respiratory Abnormality in Lung Sounds Using STFT and a Fine-Tuned ResNet18 Network
Zizhao Chen
Hongliang Wang
Chia-Hui Yeh
Xilin Liu
17
15
0
30 Aug 2022
MuLan: A Joint Embedding of Music Audio and Natural Language
Qingqing Huang
A. Jansen
Joonseok Lee
Ravi Ganti
Judith Yue Li
D. Ellis
30
131
0
26 Aug 2022
Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers
Paul Primus
Gerhard Widmer
VLM
21
5
0
24 Aug 2022
A differentiable short-time Fourier transform with respect to the window length
Maxime Leiber
Axel Barrau
Y. Marnissi
D. Abboud
9
8
0
23 Aug 2022
Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition
Armand K. Koupai
M. J. Bocus
Raúl Santos-Rodríguez
Robert Piechocki
Ryan McConville
ViT
32
9
0
15 Aug 2022
Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion
Yuanbo Hou
Bo Kang
Dick Botteldooren
18
3
0
03 Aug 2022
Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Mireille Fares
Michele Grimaldi
Catherine Pelachaud
Nicolas Obin
32
17
0
03 Aug 2022
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
33
21
0
29 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
29
10
0
21 Jul 2022
Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
N. Harada
K. Kashino
11
2
0
20 Jul 2022
COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram Transformers
Idil Aytekin
Onat Dalmaz
Kaan Gonc
H. Ankishan
E. Saritas
Ulas Bagci
H. Celik
Tolga Çukur
22
12
0
19 Jul 2022
GAFX: A General Audio Feature eXtractor
Zhaoyang Bu
Han Zhang
Xiaohu Zhu
30
0
0
19 Jul 2022
Visually-aware Acoustic Event Detection using Heterogeneous Graphs
A. Shirian
Krishna Somandepalli
Victor Sanchez
T. Guha
14
3
0
16 Jul 2022
Segment-level Metric Learning for Few-shot Bioacoustic Event Detection
Haohe Liu
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
31
8
0
15 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming Yang
Serge Belongie
Huayu Chen
VLM
41
22
0
15 Jul 2022
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
21
268
0
13 Jul 2022
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Nigamaa Nayakanti
Rami Al-Rfou
Aurick Zhou
Kratarth Goel
Khaled S. Refaat
Benjamin Sapp
AI4TS
44
237
0
12 Jul 2022
Previous
1
2
3
...
10
7
8
9
Next