ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.00652
  4. Cited By
CSWin Transformer: A General Vision Transformer Backbone with
  Cross-Shaped Windows
v1v2v3 (latest)

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

1 July 2021
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
    ViT
ArXiv (abs)PDFHTMLGithub (569★)

Papers citing "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows"

50 / 440 papers shown
Title
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
180
700
0
10 Nov 2022
Demystify Transformers & Convolutions in Modern Image Deep Networks
Demystify Transformers & Convolutions in Modern Image Deep Networks
Jifeng Dai
Min Shi
Weiyun Wang
Sitong Wu
Linjie Xing
...
Lewei Lu
Jie Zhou
Xiaogang Wang
Yu Qiao
Xiao-hua Hu
ViT
105
11
0
10 Nov 2022
ViT-LSLA: Vision Transformer with Light Self-Limited-Attention
ViT-LSLA: Vision Transformer with Light Self-Limited-Attention
Zhenzhe Hechen
Wei Huang
Yixin Zhao
ViT
59
6
0
31 Oct 2022
Grafting Vision Transformers
Grafting Vision Transformers
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
97
3
0
28 Oct 2022
SemFormer: Semantic Guided Activation Transformer for Weakly Supervised
  Semantic Segmentation
SemFormer: Semantic Guided Activation Transformer for Weakly Supervised Semantic Segmentation
Junliang Chen
Xiaodong Zhao
Cheng Luo
Linlin Shen
ViT
118
3
0
26 Oct 2022
MetaFormer Baselines for Vision
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
110
171
0
24 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using
  Strips Window Attention
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Lu Zhou
Lei Wang
Zaiyan Dai
Jun Yang
ViT
138
27
0
22 Oct 2022
Accumulated Trivial Attention Matters in Vision Transformers on Small
  Datasets
Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
Xiangyu Chen
Qinghao Hu
Kaidong Li
Cuncong Zhong
Guanghui Wang
ViT
81
13
0
22 Oct 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
135
475
0
17 Oct 2022
Point Transformer V2: Grouped Vector Attention and Partition-based
  Pooling
Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
Xiaoyang Wu
Yixing Lao
Li Jiang
Xihui Liu
Hengshuang Zhao
3DPCViT
179
407
0
11 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision
  Models
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViTMoE
126
66
0
04 Oct 2022
Learning Hierarchical Image Segmentation For Recognition and By
  Recognition
Learning Hierarchical Image Segmentation For Recognition and By Recognition
Tsung-Wei Ke
Sangwoo Mo
Stella X. Yu
VLM
146
11
0
01 Oct 2022
Effective Vision Transformer Training: A Data-Centric Perspective
Effective Vision Transformer Training: A Data-Centric Perspective
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
80
5
0
29 Sep 2022
Axially Expanded Windows for Local-Global Interaction in Vision
  Transformers
Axially Expanded Windows for Local-Global Interaction in Vision Transformers
Zhemin Zhang
Xun Gong
ViT
52
1
0
19 Sep 2022
Hybrid Window Attention Based Transformer Architecture for Brain Tumor
  Segmentation
Hybrid Window Attention Based Transformer Architecture for Brain Tumor Segmentation
Himashi Peiris
Munawar Hayat
Zhaolin Chen
Gary Egan
Mehrtash Harandi
MedIm
61
6
0
16 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLMVLM
152
153
0
15 Sep 2022
Spatial-Temporal Transformer for Video Snapshot Compressive Imaging
Spatial-Temporal Transformer for Video Snapshot Compressive Imaging
Lishun Wang
Miao Cao
Yong Zhong
Xin Yuan
73
48
0
04 Sep 2022
Transformers in Remote Sensing: A Survey
Transformers in Remote Sensing: A Survey
Abdulaziz Amer Aleissaee
Amandeep Kumar
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
Guisong Xia
Fahad Shahbaz Khan
ViT
105
196
0
02 Sep 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for
  Visual Recognition
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
70
15
0
31 Aug 2022
MRL: Learning to Mix with Attention and Convolutions
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
106
2
0
30 Aug 2022
ClusTR: Exploring Efficient Self-attention via Clustering for Vision
  Transformers
ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers
Yutong Xie
Jianpeng Zhang
Yong-quan Xia
Anton Van Den Hengel
Qi Wu
63
6
0
28 Aug 2022
Video Mobile-Former: Video Recognition with Efficient Global
  Spatial-temporal Modeling
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Rui Wang
Zuxuan Wu
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Luowei Zhou
Lu Yuan
Yu-Gang Jiang
ViT
118
5
0
25 Aug 2022
LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction
LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction
Xinhan Di
Pengqian Yu
CVBM
80
8
0
21 Aug 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation
  Model
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
Lefei Zhang
90
257
0
08 Aug 2022
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object
  Detection
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection
Zhipeng Luo
Gongjie Zhang
Changqing Zhou
Ti Liu
Shijian Lu
Liang Pan
3DPCViT
93
9
0
04 Aug 2022
DropKey
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
59
12
0
04 Aug 2022
Unified Normalization for Accelerating and Stabilizing Transformers
Unified Normalization for Accelerating and Stabilizing Transformers
Qiming Yang
Kai Zhang
Chaoxiang Lan
Zhi Yang
Zheyang Li
Wenming Tan
Jun Xiao
Shiliang Pu
77
8
0
02 Aug 2022
A Novel Transformer Network with Shifted Window Cross-Attention for
  Spatiotemporal Weather Forecasting
A Novel Transformer Network with Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting
Alabi Bojesomo
Hasan Al-Marzouqi
P. Liatsis
86
10
0
02 Aug 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated
  Convolutions
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Yongming Rao
Wenliang Zhao
Yansong Tang
Jie Zhou
Ser-Nam Lim
Jiwen Lu
ViT
119
256
0
28 Jul 2022
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger
Cong Wang
Hongmin Xu
Xiong Zhang
Li Wang
Zhitong Zheng
Haifeng Liu
ViT
61
23
0
27 Jul 2022
Efficient High-Resolution Deep Learning: A Survey
Efficient High-Resolution Deep Learning: A Survey
Arian Bakhtiarnia
Qi Zhang
Alexandros Iosifidis
MedIm
158
21
0
26 Jul 2022
Behind Every Domain There is a Shift: Adapting Distortion-aware Vision
  Transformers for Panoramic Semantic Segmentation
Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation
Jiaming Zhang
Kailun Yang
Haowen Shi
Simon Reiß
Kunyu Peng
Chaoxiang Ma
Haodong Fu
Philip H. S. Torr
Kaiwei Wang
Rainer Stiefelhagen
ViTMDE
112
39
0
25 Jul 2022
Multi-manifold Attention for Vision Transformers
Multi-manifold Attention for Vision Transformers
D. Konstantinidis
Ilias Papastratis
K. Dimitropoulos
P. Daras
ViT
103
16
0
18 Jul 2022
IDET: Iterative Difference-Enhanced Transformers for High-Quality Change
  Detection
IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection
Qingle Guo
Ruofei Wang
Rui Huang
Wei Fan
Yuxiang Zhang
67
16
0
15 Jul 2022
Parameterization of Cross-Token Relations with Relative Positional
  Encoding for Vision MLP
Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP
Zhicai Wang
Y. Hao
Xingyu Gao
Hao Zhang
Shuo Wang
Tingting Mu
Xiangnan He
76
8
0
15 Jul 2022
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
91
80
0
14 Jul 2022
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in
  Realistic Industrial Scenarios
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
Jiashi Li
Xin Xia
W. Li
Huixia Li
Xing Wang
Xuefeng Xiao
Rui Wang
Min Zheng
Xin Pan
ViT
96
155
0
12 Jul 2022
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
  Transformers
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
Runsheng Xu
Zhengzhong Tu
Hao Xiang
Wei Shao
Bolei Zhou
Jiaqi Ma
153
229
0
05 Jul 2022
Improving Semantic Segmentation in Transformers using Hierarchical
  Inter-Level Attention
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung
Jun Gao
Fangyin Wei
Sanja Fidler
82
3
0
05 Jul 2022
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
Yukang Chen
Jianhui Liu
Xinming Zhang
Xiaojuan Qi
Jiaya Jia
124
91
0
21 Jun 2022
Global Context Vision Transformers
Global Context Vision Transformers
Ali Hatamizadeh
Hongxu Yin
Greg Heinrich
Jan Kautz
Pavlo Molchanov
ViT
84
129
0
20 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
123
35
0
19 Jun 2022
Video Capsule Endoscopy Classification using Focal Modulation Guided
  Convolutional Neural Network
Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network
Abhishek Srivastava
Nikhil Kumar Tomar
Ulas Bagci
Debesh Jha
MedIm
62
16
0
16 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
Chong Li
Biao Wang
Xihan Wei
Lei Zhang
Margret Keuper
Xia Hua
ViT
73
15
0
15 Jun 2022
Peripheral Vision Transformer
Peripheral Vision Transformer
Juhong Min
Yucheng Zhao
Chong Luo
Minsu Cho
ViTMDE
82
33
0
14 Jun 2022
Recurrent Video Restoration Transformer with Guided Deformable Attention
Recurrent Video Restoration Transformer with Guided Deformable Attention
Christos Sakaridis
Yuchen Fan
Xiaoyu Xiang
Rakesh Ranjan
Eddy Ilg
Simon Green
Jingyun Liang
Peng Sun
Radu Timofte
Luc Van Gool
146
170
0
05 Jun 2022
Transforming medical imaging with Transformers? A comparative review of
  key properties, current progresses, and future perspectives
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViTOODMedIm
179
47
0
02 Jun 2022
Self-Supervised Pre-training of Vision Transformers for Dense Prediction
  Tasks
Self-Supervised Pre-training of Vision Transformers for Dense Prediction Tasks
Jaonary Rabarisoa
Velentin Belissen
Florian Chabot
Q. C. Pham
VLMViTSSLMDE
45
3
0
30 May 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
106
29
0
30 May 2022
WT-MVSNet: Window-based Transformers for Multi-view Stereo
WT-MVSNet: Window-based Transformers for Multi-view Stereo
Jinli Liao
Yikang Ding
Yoli Shavit
Dihe Huang
Shihao Ren
Jia Guo
Wensen Feng
Kai Zhang
ViT
80
29
0
28 May 2022
Previous
123456789
Next