Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.11227
Cited By
Multiscale Vision Transformers
22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multiscale Vision Transformers"
50 / 736 papers shown
Title
AutoFocusFormer: Image Segmentation off the Grid
Chen Ziwen
K. Patnaik
Shuangfei Zhai
Alvin Wan
Zhile Ren
A. Schwing
Alex Colburn
Li Fuxin
17
9
0
24 Apr 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
32
13
0
24 Apr 2023
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
Siyuan Wei
Tianzhu Ye
Shen Zhang
Yao Tang
Jiajun Liang
ViT
11
65
0
21 Apr 2023
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
30
35
0
20 Apr 2023
Transformer-Based Visual Segmentation: A Survey
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViT
MedIm
42
132
0
19 Apr 2023
MLP-AIR: An Efficient MLP-Based Method for Actor Interaction Relation Learning in Group Activity Recognition
Guoliang Xu
Jianqin Yin
19
1
0
18 Apr 2023
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
36
14
0
17 Apr 2023
MS-LSTM: Exploring Spatiotemporal Multiscale Representations in Video Prediction Domain
Zhifeng Ma
Hao Zhang
Jie Liu
21
7
0
16 Apr 2023
Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment
Kai Zhao
Kun Yuan
Ming-Ting Sun
Xingsen Wen
13
20
0
13 Apr 2023
How you feelin'? Learning Emotions and Mental States in Movie Scenes
D. Srivastava
A. Singh
Makarand Tapaswi
32
10
0
12 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition
Guangyong Wei
Zhikui Duan
Shiren Li
Guangguang Yang
Xinmei Yu
Junhua Li
22
4
0
11 Apr 2023
VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views
Jan Held
A. Cioppa
Silvio Giancola
Abdullah Hamdi
Bernard Ghanem
Marc Van Droogenbroeck
27
29
0
10 Apr 2023
Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Xuran Pan
Tianzhu Ye
Zhuofan Xia
S. Song
Gao Huang
ViT
31
53
0
09 Apr 2023
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Ziteng Gao
Zhan Tong
Limin Wang
Mike Zheng Shou
33
9
0
07 Apr 2023
PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift
Gaojie Wu
Weishi Zheng
Yutong Lu
Q. Tian
ViT
45
15
0
07 Apr 2023
EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera Depth Estimation
Y. Shi
H. Cai
Amin Ansari
Fatih Porikli
MDE
88
17
0
06 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
F. Khan
M. Shah
VLM
VPVLM
30
73
0
06 Apr 2023
Diffusion Models as Masked Autoencoders
Chen Wei
K. Mangalam
Po-Yao (Bernie) Huang
Yanghao Li
Haoqi Fan
Hu Xu
Huiyu Wang
Cihang Xie
Alan Yuille
Christoph Feichtenhofer
DiffM
SyDa
36
48
0
06 Apr 2023
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
38
5
0
06 Apr 2023
InterFormer: Real-time Interactive Image Segmentation
YouFu Huang
Hao Yang
Ke Sun
Shengchuan Zhang
Liujuan Cao
Guannan Jiang
Rongrong Ji
32
22
0
06 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
33
19
0
05 Apr 2023
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang
Haibin Ling
15
2
0
04 Apr 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Christoph Feichtenhofer
Jitendra Malik
36
30
0
03 Apr 2023
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yong-Lu Li
Xiaoqian Wu
Xinpeng Liu
Zehao Wang
Yiming Dou
...
Junyi Zhang
Yixing Li
Jingru Tan
Xudong Lu
Cewu Lu
27
17
0
02 Apr 2023
Video Pretraining Advances 3D Deep Learning on Chest CT Tasks
Alexander Ke
Shih-Cheng Huang
Chloe P. O'Connell
M. Klimont
Serena Yeung
Pranav Rajpurkar
19
7
0
02 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
21
0
0
01 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
13
4
0
01 Apr 2023
APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding
Hengjia Li
Tu Zheng
Zhihao Chi
Zheng Yang
Wenxiao Wang
Boxi Wu
Binbin Lin
Deng Cai
3DPC
38
1
0
31 Mar 2023
Streaming Video Model
Yucheng Zhao
Chong Luo
Chuanxin Tang
Dongdong Chen
Noel Codella
Zhengjun Zha
33
12
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
59
325
0
29 Mar 2023
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Karim Abou Zeid
Jonas Schult
Alexander Hermans
Bastian Leibe
3DPC
20
27
0
29 Mar 2023
ARMBench: An Object-centric Benchmark Dataset for Robotic Manipulation
Chaitanya Mitash
Fan Wang
Shiyang Lu
Vikedo Terhuja
T. Garaas
F. Polido
M. Nambi
22
27
0
29 Mar 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
I. Dave
Mamshad Nayeem Rizve
C. L. P. Chen
M. Shah
TTA
38
16
0
28 Mar 2023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
25
3
0
28 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
43
154
0
28 Mar 2023
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Abdelrahman M. Shaker
Muhammad Maaz
H. Rasheed
Salman Khan
Ming Yang
F. Khan
ViT
48
84
0
27 Mar 2023
Frame Flexible Network
Yitian Zhang
Yue Bai
Chang Liu
Huan Wang
Sheng R. Li
Yun Fu
11
4
0
26 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
34
94
0
25 Mar 2023
Towards Scalable Neural Representation for Diverse Videos
Bo He
Xitong Yang
Hanyu Wang
Zuxuan Wu
Hao Chen
Shuaiyi Huang
Yixuan Ren
Ser-Nam Lim
Abhinav Shrivastava
54
41
0
24 Mar 2023
Learning and Verification of Task Structure in Instructional Videos
Medhini Narasimhan
Licheng Yu
Sean Bell
Ning Zhang
Trevor Darrell
62
19
0
23 Mar 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
113
63
0
23 Mar 2023
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Sixun Dong
Huazhang Hu
Dongze Lian
Weixin Luo
Yichen Qian
Shenghua Gao
ViT
AI4TS
21
11
0
22 Mar 2023
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
J. Hernandez
Ruben Villegas
Vicente Ordonez
SSL
33
4
0
21 Mar 2023
The Multiscale Surface Vision Transformer
Simon Dahan
Logan Z. J. Williams
Daniel Rueckert
E. C. Robinson
MedIm
ViT
10
2
0
21 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
37
22
0
19 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Jungin Park
Jiyoung Lee
K. Sohn
ViT
21
37
0
17 Mar 2023
Video Action Recognition with Attentive Semantic Units
Yifei Chen
Dapeng Chen
Ruijin Liu
Hao Li
Wei Peng
19
11
0
17 Mar 2023
MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation
Nicolás Ayobi
Alejandra Pérez-Rondón
Santiago Rodríguez
Pablo Arbelaez
MedIm
43
18
0
16 Mar 2023
Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm
Hengyuan Zhao
Hao Luo
Yuyang Zhao
Pichao Wang
F. Wang
Mike Zheng Shou
21
5
0
14 Mar 2023
Previous
1
2
3
...
7
8
9
...
13
14
15
Next