Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.09286
Cited By
FlexiAST: Flexibility is What AST Needs
18 July 2023
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FlexiAST: Flexibility is What AST Needs"
16 / 16 papers shown
Title
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
68
280
0
13 Jul 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
64
68
0
26 Apr 2022
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
Alan Baade
Puyuan Peng
David Harwath
52
101
0
30 Mar 2022
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
144
686
0
02 Dec 2021
Efficient Training of Audio Transformers with Patchout
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
117
257
0
11 Oct 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
92
560
0
30 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
147
2,939
0
14 Jun 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
127
1,254
0
22 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
483
3,699
0
24 Feb 2021
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng
Jiachen Lu
Hengshuang Zhao
Xiatian Zhu
Zekun Luo
...
Yanwei Fu
Jianfeng Feng
Tao Xiang
Philip Torr
Li Zhang
ViT
170
2,893
0
31 Dec 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
221
5,767
0
20 Jun 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
349
12,966
0
26 May 2020
VGGSound: A Large-scale Audio-Visual Dataset
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
70
573
0
29 Apr 2020
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
217
3,667
0
06 Aug 2019
Video Action Transformer Network
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
ViT
124
708
0
06 Dec 2018
Learning from Between-class Examples for Deep Sound Recognition
Yuji Tokozume
Yoshitaka Ushiku
Tatsuya Harada
SSL
72
237
0
28 Nov 2017
1