Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.10757
Cited By
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
19 March 2023
Wenjie Zhu
M. Omar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multiscale Audio Spectrogram Transformer for Efficient Audio Classification"
19 / 19 papers shown
Title
Multimodal Action Quality Assessment
Ling-an Zeng
Wei-Shi Zheng
91
14
0
31 Jan 2024
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
85
41
0
06 Apr 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
51
29
0
13 Mar 2022
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
144
689
0
02 Dec 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
98
565
0
30 Jun 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
127
1,259
0
22 Apr 2021
Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
77
68
0
05 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
527
3,722
0
24 Feb 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Yuan Gong
Yu-An Chung
James R. Glass
VLM
159
147
0
02 Feb 2021
Rescaling Egocentric Vision
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
65
458
0
23 Jun 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
83
233
0
05 Jun 2020
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
68
232
0
29 Apr 2020
VGGSound: A Large-scale Audio-Visual Dataset
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
89
576
0
29 Apr 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
248
208
0
23 Jan 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
186
1,076
0
21 Dec 2019
A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling
Yun Wang
Juncheng Billy Li
Florian Metze
46
184
0
22 Oct 2018
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
111
903
0
23 May 2017
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
120
2,498
0
29 Sep 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
330
8,116
0
13 Aug 2016
1