ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.11227
  4. Cited By
Multiscale Vision Transformers

Multiscale Vision Transformers

22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "Multiscale Vision Transformers"

36 / 736 papers shown
Title
Long-Short Temporal Contrastive Learning of Video Transformers
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLM
ViT
18
50
0
17 Jun 2021
Relation Modeling in Spatio-Temporal Action Localization
Relation Modeling in Spatio-Temporal Action Localization
Yutong Feng
Jianwen Jiang
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Shiwei Zhang
Mingqian Tang
Yue Gao
22
11
0
15 Jun 2021
S$^2$-MLP: Spatial-Shift MLP Architecture for Vision
S2^22-MLP: Spatial-Shift MLP Architecture for Vision
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
41
186
0
14 Jun 2021
DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation
DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation
Ai-Jun Lin
Bingzhi Chen
Jiayu Xu
Zheng-Wei Zhang
Guangming Lu
ViT
MedIm
15
605
0
12 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
8
274
0
09 Jun 2021
Fully Transformer Networks for Semantic Image Segmentation
Fully Transformer Networks for Semantic Image Segmentation
Sitong Wu
Tianyi Wu
Fangjian Lin
Sheng Tian
Guodong Guo
ViT
34
39
0
08 Jun 2021
Vision Transformers with Hierarchical Attention
Vision Transformers with Hierarchical Attention
Yun-Hai Liu
Yu-Huan Wu
Guolei Sun
Le Zhang
Ajad Chhatkuli
Luc Van Gool
ViT
32
32
0
06 Jun 2021
Glance-and-Gaze Vision Transformer
Glance-and-Gaze Vision Transformer
Qihang Yu
Yingda Xia
Yutong Bai
Yongyi Lu
Alan Yuille
Wei Shen
ViT
12
74
0
04 Jun 2021
Anticipative Video Transformer
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
25
207
0
03 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or
  Strong Data Augmentations
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
24
320
0
03 Jun 2021
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient
  Image Recognition
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition
Yulin Wang
Rui Huang
S. Song
Zeyi Huang
Gao Huang
ViT
19
189
0
31 May 2021
ResT: An Efficient Transformer for Visual Recognition
ResT: An Efficient Transformer for Visual Recognition
Qing-Long Zhang
Yubin Yang
ViT
24
229
0
28 May 2021
KVT: k-NN Attention for Boosting Vision Transformers
KVT: k-NN Attention for Boosting Vision Transformers
Pichao Wang
Xue Wang
F. Wang
Ming Lin
Shuning Chang
Hao Li
R. L. Jin
ViT
34
105
0
28 May 2021
Beyond Self-attention: External Attention using Two Linear Layers for
  Visual Tasks
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
20
473
0
05 May 2021
VidTr: Video Transformer Without Convolutions
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
136
193
0
23 Apr 2021
CTNet: Context-based Tandem Network for Semantic Segmentation
CTNet: Context-based Tandem Network for Semantic Segmentation
Zechao Li
Yanpeng Sun
Jinhui Tang
15
173
0
20 Apr 2021
TubeR: Tubelet Transformer for Video Action Detection
TubeR: Tubelet Transformer for Video Action Detection
Jiaojiao Zhao
Yanyi Zhang
Xinyu Li
Hao Chen
Shuai Bing
...
Yuanjun Xiong
Davide Modolo
I. Marsic
Cees G. M. Snoek
Joseph Tighe
ViT
28
70
0
02 Apr 2021
Learning Representational Invariances for Data-Efficient Action
  Recognition
Learning Representational Invariances for Data-Efficient Action Recognition
Yuliang Zou
Jinwoo Choi
Qitong Wang
Jia-Bin Huang
14
39
0
30 Mar 2021
Predicting post-operative right ventricular failure using video-based
  deep learning
Predicting post-operative right ventricular failure using video-based deep learning
R. Shad
Nicolas Quach
R. Fong
P. Kasinpila
C. Bowles
...
Y. Woo
J. Teuteberg
John P. Cunningham
C. Langlotz
W. Hiesinger
20
40
0
28 Feb 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,777
0
24 Feb 2021
ROAD: The ROad event Awareness Dataset for Autonomous Driving
ROAD: The ROad event Awareness Dataset for Autonomous Driving
Gurkirt Singh
Stephen Akrigg
Manuele Di Maio
Valentina Fontana
Reza Javanmard Alitappeh
...
Salman Khan
S. Grazioso
Andrew Bradley
G. Gironimo
Fabio Cuzzolin
27
89
0
23 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
269
179
0
17 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,981
0
09 Feb 2021
TransReID: Transformer-based Object Re-Identification
TransReID: Transformer-based Object Re-Identification
Shuting He
Haowen Luo
Pichao Wang
F. Wang
Hao Li
Wei Jiang
ViT
213
794
0
08 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
198
421
0
01 Feb 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
227
2,428
0
04 Jan 2021
A Survey on Visual Transformer
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
18
2,128
0
23 Dec 2020
Human Action Recognition from Various Data Modalities: A Review
Human Action Recognition from Various Data Modalities: A Review
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
42
504
0
22 Dec 2020
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization
  for Efficient Video Classification
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Youngwan Lee
Hyungil Kim
Kimin Yun
Jinyoung Moon
18
12
0
01 Dec 2020
Mutual Modality Learning for Video Action Classification
Mutual Modality Learning for Video Action Classification
Stepan Alekseevich Komkov
Maksim Dzabraev
Aleksandr Petiushko
11
9
0
04 Nov 2020
Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit
  Latent Features
Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit Latent Features
Myeongah Cho
Taeoh Kim
Woojin Kim
Suhwan Cho
Sangyoun Lee
6
90
0
15 Oct 2020
Video Action Understanding
Video Action Understanding
Matthew Hutchinson
V. Gadepally
30
20
0
13 Oct 2020
Sparsifying Transformer Models with Trainable Representation Pooling
Sparsifying Transformer Models with Trainable Representation Pooling
Michal Pietruszka
Łukasz Borchmann
Lukasz Garncarek
13
10
0
10 Sep 2020
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
415
595
0
21 Jul 2020
LIP: Local Importance-based Pooling
LIP: Local Importance-based Pooling
Ziteng Gao
Limin Wang
Gangshan Wu
FAtt
29
94
0
12 Aug 2019
Improving neural networks by preventing co-adaptation of feature
  detectors
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
266
7,634
0
03 Jul 2012
Previous
123...131415