Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00652
Cited By
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
1 July 2021
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows"
50 / 440 papers shown
Title
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
40
158
0
24 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Lu Zhou
Lei Wang
Zaiyan Dai
Jun Yang
ViT
34
24
0
22 Oct 2022
Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
Xiangyu Chen
Qinghao Hu
Kaidong Li
Cuncong Zhong
Guanghui Wang
ViT
38
11
0
22 Oct 2022
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
51
425
0
17 Oct 2022
Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
Xiaoyang Wu
Yixing Lao
Li Jiang
Xihui Liu
Hengshuang Zhao
3DPC
ViT
32
369
0
11 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
39
60
0
04 Oct 2022
Learning Hierarchical Image Segmentation For Recognition and By Recognition
Tsung-Wei Ke
Sangwoo Mo
Stella X. Yu
VLM
42
9
0
01 Oct 2022
Effective Vision Transformer Training: A Data-Centric Perspective
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
26
5
0
29 Sep 2022
Axially Expanded Windows for Local-Global Interaction in Vision Transformers
Zhemin Zhang
Xun Gong
ViT
21
1
0
19 Sep 2022
Hybrid Window Attention Based Transformer Architecture for Brain Tumor Segmentation
Himashi Peiris
Munawar Hayat
Zhaolin Chen
Gary Egan
Mehrtash Harandi
MedIm
29
4
0
16 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
38
149
0
15 Sep 2022
Spatial-Temporal Transformer for Video Snapshot Compressive Imaging
Lishun Wang
Miao Cao
Yong Zhong
Xin Yuan
27
47
0
04 Sep 2022
Transformers in Remote Sensing: A Survey
Abdulaziz Amer Aleissaee
Amandeep Kumar
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
Guisong Xia
Fahad Shahbaz Khan
ViT
57
176
0
02 Sep 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
31
9
0
31 Aug 2022
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
28
2
0
30 Aug 2022
ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers
Yutong Xie
Jianpeng Zhang
Yong-quan Xia
Anton Van Den Hengel
Qi Wu
33
6
0
28 Aug 2022
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Rui Wang
Zuxuan Wu
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Luowei Zhou
Lu Yuan
Yu-Gang Jiang
ViT
43
4
0
25 Aug 2022
LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction
Xinhan Di
Pengqian Yu
CVBM
26
8
0
21 Aug 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
Lefei Zhang
36
243
0
08 Aug 2022
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection
Zhipeng Luo
Gongjie Zhang
Changqing Zhou
Ti Liu
Shijian Lu
Liang Pan
3DPC
ViT
48
9
0
04 Aug 2022
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
20
11
0
04 Aug 2022
Unified Normalization for Accelerating and Stabilizing Transformers
Qiming Yang
Kai Zhang
Chaoxiang Lan
Zhi Yang
Zheyang Li
Wenming Tan
Jun Xiao
Shiliang Pu
17
8
0
02 Aug 2022
A Novel Transformer Network with Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting
Alabi Bojesomo
Hasan Al-Marzouqi
P. Liatsis
19
9
0
02 Aug 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Yongming Rao
Wenliang Zhao
Yansong Tang
Jie Zhou
Ser-Nam Lim
Jiwen Lu
ViT
22
251
0
28 Jul 2022
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger
Cong Wang
Hongmin Xu
Xiong Zhang
Li Wang
Zhitong Zheng
Haifeng Liu
ViT
20
21
0
27 Jul 2022
Efficient High-Resolution Deep Learning: A Survey
Arian Bakhtiarnia
Qi Zhang
Alexandros Iosifidis
MedIm
21
19
0
26 Jul 2022
Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation
Jiaming Zhang
Kailun Yang
Haowen Shi
Simon Reiß
Kunyu Peng
Chaoxiang Ma
Haodong Fu
Philip H. S. Torr
Kaiwei Wang
Rainer Stiefelhagen
ViT
MDE
36
36
0
25 Jul 2022
Multi-manifold Attention for Vision Transformers
D. Konstantinidis
Ilias Papastratis
K. Dimitropoulos
P. Daras
ViT
27
16
0
18 Jul 2022
IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection
Qingle Guo
Ruofei Wang
Rui Huang
Wei Fan
Yuxiang Zhang
27
14
0
15 Jul 2022
Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP
Zhicai Wang
Y. Hao
Xingyu Gao
Hao Zhang
Shuo Wang
Tingting Mu
Xiangnan He
21
8
0
15 Jul 2022
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
22
75
0
14 Jul 2022
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
Jiashi Li
Xin Xia
W. Li
Huixia Li
Xing Wang
Xuefeng Xiao
Rui Wang
Min Zheng
Xin Pan
ViT
17
149
0
12 Jul 2022
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
Runsheng Xu
Zhengzhong Tu
Hao Xiang
Wei Shao
Bolei Zhou
Jiaqi Ma
56
219
0
05 Jul 2022
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung
Jun Gao
Fangyin Wei
Sanja Fidler
21
3
0
05 Jul 2022
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
Yukang Chen
Jianhui Liu
Xinming Zhang
Xiaojuan Qi
Jiaya Jia
56
85
0
21 Jun 2022
Global Context Vision Transformers
Ali Hatamizadeh
Hongxu Yin
Greg Heinrich
Jan Kautz
Pavlo Molchanov
ViT
25
120
0
20 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
39
32
0
19 Jun 2022
Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network
Abhishek Srivastava
Nikhil Kumar Tomar
Ulas Bagci
Debesh Jha
MedIm
24
15
0
16 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
Chong Li
Biao Wang
Xihan Wei
Lei Zhang
M. Keuper
Xia Hua
ViT
37
15
0
15 Jun 2022
Peripheral Vision Transformer
Juhong Min
Yucheng Zhao
Chong Luo
Minsu Cho
ViT
MDE
32
30
0
14 Jun 2022
Recurrent Video Restoration Transformer with Guided Deformable Attention
Christos Sakaridis
Yuchen Fan
Xiaoyu Xiang
Rakesh Ranjan
Eddy Ilg
Simon Green
Jingyun Liang
Peng Sun
Radu Timofte
Luc Van Gool
42
153
0
05 Jun 2022
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViT
OOD
MedIm
23
22
0
02 Jun 2022
Self-Supervised Pre-training of Vision Transformers for Dense Prediction Tasks
Jaonary Rabarisoa
Velentin Belissen
Florian Chabot
Q. C. Pham
VLM
ViT
SSL
MDE
26
2
0
30 May 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
64
26
0
30 May 2022
WT-MVSNet: Window-based Transformers for Multi-view Stereo
Jinli Liao
Yikang Ding
Yoli Shavit
Dihe Huang
Shihao Ren
Jia Guo
Wensen Feng
Kai Zhang
ViT
11
29
0
28 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
152
641
0
26 May 2022
Fast Vision Transformers with HiLo Attention
Zizheng Pan
Jianfei Cai
Bohan Zhuang
28
152
0
26 May 2022
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
37
187
0
25 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
119
73
0
20 May 2022
TRT-ViT: TensorRT-oriented Vision Transformer
Xin Xia
Jiashi Li
Jie Wu
Xing Wang
Xuefeng Xiao
Min Zheng
Rui Wang
ViT
23
27
0
19 May 2022
Previous
1
2
3
4
5
6
7
8
9
Next