Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.11227
Cited By
Multiscale Vision Transformers
22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multiscale Vision Transformers"
50 / 736 papers shown
Title
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
38
101
0
04 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
24
91
0
04 Apr 2022
Deformable Video Transformer
Jue Wang
Lorenzo Torresani
ViT
22
28
0
31 Mar 2022
Exploring Plain Vision Transformer Backbones for Object Detection
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
ViT
33
774
0
30 Mar 2022
VPTR: Efficient Transformers for Video Prediction
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
24
18
0
29 Mar 2022
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
Congcong Li
Xinyao Wang
Longyin Wen
Dexiang Hong
Tiejian Luo
Libo Zhang
21
16
0
29 Mar 2022
Self-supervised Video-centralised Transformer for Video Face Clustering
Yujiang Wang
Mingzhi Dong
Jie Shen
Yi-Si Luo
Yiming Lin
Pingchuan Ma
Stavros Petridis
M. Pantic
ViT
20
3
0
24 Mar 2022
Transformer Compressed Sensing via Global Image Tokens
M. B. Lorenzana
Craig B. Engstrom
Shekhar S. Chandra
ViT
MedIm
18
5
0
24 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
137
1,122
0
23 Mar 2022
Deep Frequency Filtering for Domain Generalization
Shiqi Lin
Zhizheng Zhang
Zhipeng Huang
Yan Lu
Cuiling Lan
...
Jiang Wang
Zicheng Liu
Amey Parulkar
V. Navkal
Zhibo Chen
25
49
0
23 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
21
32
0
22 Mar 2022
Scalable Video Object Segmentation with Identification Mechanism
Zongxin Yang
Jiaxu Miao
Yunchao Wei
Wenguan Wang
Xiaohan Wang
Yi Yang
VOS
36
23
0
22 Mar 2022
FAR: Fourier Aerial Video Recognition
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Dinesh Manocha
21
13
0
21 Mar 2022
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
Thanh-Dat Truong
Quoc-Huy Bui
C. Duong
Han-Seok Seo
S. L. Phung
Xin Li
Khoa Luu
ViT
34
49
0
19 Mar 2022
Three things everyone should know about Vision Transformers
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Jakob Verbeek
Hervé Jégou
ViT
18
119
0
18 Mar 2022
Group Contextualization for Video Recognition
Y. Hao
Haotong Zhang
Chong-Wah Ngo
Xiangnan He
8
25
0
18 Mar 2022
Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image
Xuanchi Ren
Xiaolong Wang
VGen
19
58
0
17 Mar 2022
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?
Y. Fu
Shunyao Zhang
Shan-Hung Wu
Cheng Wan
Yingyan Lin
AAML
23
64
0
16 Mar 2022
Active Learning by Feature Mixing
Amin Parvaneh
Ehsan Abbasnejad
Damien Teney
Reza Haffari
A. Hengel
Javen Qinfeng Shi
29
89
0
14 Mar 2022
Towards Self-Supervised Learning of Global and Object-Centric Representations
Federico Baldassarre
Hossein Azizpour
SSL
3DPC
OCL
35
13
0
11 Mar 2022
PAMI-AD: An Activity Detector Exploiting Part-attention and Motion Information in Surveillance Videos
Yunhao Du
Zhihang Tong
Jun-Jun Wan
Binyu Zhang
Yanyun Zhao
19
3
0
08 Mar 2022
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions
Ruikang Ju
Ting-Yu Lin
Jen-Shiun Chiang
Jia-Hao Jian
Yu-Shian Lin
Liu-Rui-Yi Huang
ViT
16
1
0
02 Mar 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
ViT
VLM
189
499
0
22 Feb 2022
HiP: Hierarchical Perceiver
João Carreira
Skanda Koppula
Daniel Zoran
Adrià Recasens
Catalin Ionescu
...
M. Botvinick
Oriol Vinyals
Karen Simonyan
Andrew Zisserman
Andrew Jaegle
VLM
31
14
0
22 Feb 2022
Movies2Scenes: Using Movie Metadata to Learn Scene Representation
Shixing Chen
Chundi Liu
Xiang Hao
Xiaohan Nie
Maxim Arap
Raffay Hamid
21
17
0
22 Feb 2022
Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for Visual Discrimination
Qingsong Zhao
Shuguang Dou
Zhipeng Zhou
Yangguang Li
Yin Wang
Yu Qiao
Cairong Zhao
20
3
0
21 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
24
229
0
21 Feb 2022
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
23
328
0
16 Feb 2022
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations
Youwei Liang
Chongjian Ge
Zhan Tong
Yibing Song
Jue Wang
P. Xie
ViT
14
233
0
16 Feb 2022
BOAT: Bilateral Local Attention Vision Transformer
Tan Yu
Gangming Zhao
Ping Li
Yizhou Yu
ViT
30
27
0
31 Jan 2022
Benchmarking Conventional Vision Models on Neuromorphic Fall Detection and Action Recognition Dataset
Karthik Sivarama Krishnan
Koushik Sivarama Krishnan
14
5
0
28 Jan 2022
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Ziyu Wang
Wenhao Jiang
Yiming Zhu
Li Yuan
Yibing Song
Wei Liu
37
43
0
28 Jan 2022
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism
Guangting Wang
Yucheng Zhao
Chuanxin Tang
Chong Luo
Wenjun Zeng
14
68
0
26 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
147
361
0
24 Jan 2022
ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer
Pengfei Guo
Yiqun Mei
Jinyuan Zhou
Shanshan Jiang
Vishal M. Patel
ViT
MedIm
81
65
0
23 Jan 2022
VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
A. Lengyel
Robert-Jan Bruintjes
Marcos Baptista-Rios
O. Kayhan
Davide Zambrano
Nergis Tomen
J. C. V. Gemert
VLM
30
11
0
21 Jan 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
37
198
0
20 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
217
225
0
20 Jan 2022
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Mannat Singh
Laura Gustafson
Aaron B. Adcock
Vinicius de Freitas Reis
B. Gedik
Raj Prateek Kosaraju
D. Mahajan
Ross B. Girshick
Piotr Dollár
L. V. D. van der Maaten
VLM
32
122
0
20 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
38
238
0
12 Jan 2022
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
26
211
0
12 Jan 2022
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
42
4,967
0
10 Jan 2022
Language as Queries for Referring Video Object Segmentation
Jiannan Wu
Yi-Xin Jiang
Pei Sun
Zehuan Yuan
Ping Luo
23
141
0
03 Jan 2022
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
25
70
0
28 Dec 2021
ELSA: Enhanced Local Self-Attention for Vision Transformer
Jingkai Zhou
Pichao Wang
Fan Wang
Qiong Liu
Hao Li
Rong Jin
ViT
34
37
0
23 Dec 2021
Learned Queries for Efficient Local Attention
Moab Arar
Ariel Shamir
Amit H. Bermano
ViT
36
29
0
21 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
29
244
0
21 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
77
655
0
16 Dec 2021
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
20
54
0
14 Dec 2021
Previous
1
2
3
...
12
13
14
15
Next