Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.11227
Cited By
Multiscale Vision Transformers
22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multiscale Vision Transformers"
50 / 736 papers shown
Title
GraphVid: It Only Takes a Few Nodes to Understand a Video
Eitan Kosman
Dotan Di Castro
GNN
38
5
0
04 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
99
93
0
04 Jul 2022
A Survey on Label-efficient Deep Image Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction
Wei Shen
Zelin Peng
Xuehui Wang
Huayu Wang
Jiazhong Cen
Dongsheng Jiang
Lingxi Xie
Xiaokang Yang
Qi Tian
VLM
19
77
0
04 Jul 2022
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Taeoh Kim
Jinhyung Kim
Minho Shim
Sangdoo Yun
Myunggu Kang
Dongyoon Wee
Sangyoun Lee
AI4TS
15
10
0
30 Jun 2022
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Vitaliy Chiley
Vithursan Thangarasa
Abhay Gupta
Anshul Samar
Joel Hestness
D. DeCoste
48
8
0
28 Jun 2022
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
Junting Pan
Ziyi Lin
Xiatian Zhu
Jing Shao
Hongsheng Li
21
190
0
27 Jun 2022
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
Muhammad Maaz
Abdelrahman M. Shaker
Hisham Cholakkal
Salman Khan
Syed Waqas Zamir
Rao Muhammad Anwer
F. Khan
ViT
27
184
0
21 Jun 2022
Bi-Calibration Networks for Weakly-Supervised Video Representation Learning
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
30
6
0
21 Jun 2022
One-stage Action Detection Transformer
Lijun Li
Lian Zhuo
Bangyin Zhang
ViT
22
0
0
21 Jun 2022
DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment
Haoning Wu
Chao-Yu Chen
Liang Liao
Jingwen Hou
Wenxiu Sun
Qiong Yan
Weisi Lin
ViT
25
50
0
20 Jun 2022
Learning Multiscale Transformer Models for Sequence Generation
Bei Li
Tong Zheng
Yi Jing
Chengbo Jiao
Tong Xiao
Jingbo Zhu
24
9
0
19 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
34
131
0
18 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
35
97
0
16 Jun 2022
Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network
Abhishek Srivastava
Nikhil Kumar Tomar
Ulas Bagci
Debesh Jha
MedIm
16
15
0
16 Jun 2022
Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022
Elad Ben-Avraham
Roei Herzig
K. Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinsky
Trevor Darrell
Amir Globerson
13
3
0
15 Jun 2022
VCT: A Video Compression Transformer
Fabian Mentzer
G. Toderici
David C. Minnen
S. Hwang
Sergi Caelles
Mario Lucic
E. Agustsson
ViT
19
97
0
15 Jun 2022
It's Time for Artistic Correspondence in Music and Video
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
13
37
0
14 Jun 2022
Stand-Alone Inter-Frame Attention in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Jiebo Luo
Tao Mei
ViT
28
46
0
14 Jun 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham
Roei Herzig
K. Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinsky
Trevor Darrell
Amir Globerson
19
0
0
13 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
24
15
0
13 Jun 2022
GateHUB: Gated History Unit with Background Suppression for Online Action Detection
Junwen Chen
Gaurav Mittal
Ye Yu
Yu Kong
Mei Chen
39
33
0
09 Jun 2022
SimVP: Simpler yet Better Video Prediction
Zhangyang Gao
Cheng Tan
Lirong Wu
Stan Z. Li
38
212
0
09 Jun 2022
Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting
Amin Shabani
A. Abdi
Li Meng
Tristan Sylvain
AI4TS
27
61
0
08 Jun 2022
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector
Lin Sui
Chen-Da Liu-Zhang
Lixin Gu
Feng Han
22
8
0
07 Jun 2022
MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning
Zhifeng Ma
Hao Zhang
Jie Liu
HAI
AI4CE
25
12
0
07 Jun 2022
A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
M. Kowal
Mennatullah Siam
Md. Amirul Islam
Neil D. B. Bruce
Richard P. Wildes
Konstantinos G. Derpanis
18
25
0
06 Jun 2022
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
Richard J. Chen
Chengkuan Chen
Yicong Li
Tiffany Y. Chen
A. Trister
Rahul G. Krishnan
Faisal Mahmood
ViT
MedIm
34
406
0
06 Jun 2022
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViT
OOD
MedIm
21
20
0
02 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
24
52
0
02 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
19
99
0
02 Jun 2022
Transformer with Fourier Integral Attentions
T. Nguyen
Minh Pham
Tam Nguyen
Khai Nguyen
Stanley J. Osher
Nhat Ho
17
4
0
01 Jun 2022
Robotic grasp detection based on Transformer
Mingshuai Dong
Xiuli Yu
ViT
29
13
0
30 May 2022
Future Transformer for Long-term Action Anticipation
Dayoung Gong
Joonseok Lee
Manjin Kim
S. Ha
Minsu Cho
AI4TS
8
61
0
27 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
146
638
0
26 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
27
68
0
26 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
110
73
0
20 May 2022
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Feng Liu
Xiaosong Zhang
Zhiliang Peng
Zonghao Guo
Fang Wan
Xian-Wei Ji
QiXiang Ye
ObjD
43
20
0
19 May 2022
Vision Transformer: Vit and its Derivatives
Zujun Fu
ViT
33
6
0
12 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
31
180
0
06 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
22
0
0
03 May 2022
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
Alexandros Stergiou
Dima Damen
AI4TS
EgoV
EDL
17
7
0
28 Apr 2022
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
27
0
0
25 Apr 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation
Shusheng Yang
Xinggang Wang
Yu Li
Yuxin Fang
Jiemin Fang
Wenyu Liu
Xun Zhao
Ying Shan
ViT
24
64
0
18 Apr 2022
Video Action Detection: Analysing Limitations and Challenges
Rajat Modi
A. J. Rana
Akash Kumar
Praveen Tirupattur
Shruti Vyas
Y. S. Rawat
M. Shah
14
12
0
17 Apr 2022
ResT V2: Simpler, Faster and Stronger
Qing-Long Zhang
Yubin Yang
ViT
32
25
0
15 Apr 2022
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
42
389
0
14 Apr 2022
Learning Local and Global Temporal Contexts for Video Semantic Segmentation
Guolei Sun
Yun Liu
Henghui Ding
Min Wu
Luc Van Gool
27
32
0
07 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
23
55
0
06 Apr 2022
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
Mingfei Han
David Junhao Zhang
Yali Wang
Rui Yan
L. Yao
Xiaojun Chang
Yu Qiao
21
55
0
05 Apr 2022
MaxViT: Multi-Axis Vision Transformer
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
48
636
0
04 Apr 2022
Previous
1
2
3
...
11
12
13
14
15
Next