Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00652
Cited By
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
1 July 2021
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows"
50 / 440 papers shown
Title
FFT-based Dynamic Token Mixer for Vision
Yuki Tatsunami
Masato Taki
45
20
0
07 Mar 2023
Delivering Arbitrary-Modal Semantic Segmentation
Jiaming Zhang
R. Liu
Haowen Shi
Kailun Yang
Simon Reiß
Kunyu Peng
Haodong Fu
Kaiwei Wang
Rainer Stiefelhagen
VLM
56
89
0
02 Mar 2023
A Convolutional Vision Transformer for Semantic Segmentation of Side-Scan Sonar Data
Hayat Rajani
N. Gracias
Rafael García
ViT
27
12
0
24 Feb 2023
Human MotionFormer: Transferring Human Motions with Vision Transformers
Hongyu Liu
Xintong Han
Chengbin Jin
Lihui Qian
Huawei Wei
...
Faqiang Wang
Haoye Dong
Yibing Song
Jia Xu
Qifeng Chen
16
11
0
22 Feb 2023
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
30
6
0
16 Feb 2023
3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection
Jong Sung Park
Apoorv Singh
Varun Bankiti
3DPC
23
7
0
16 Feb 2023
CEDNet: A Cascade Encoder-Decoder Network for Dense Prediction
Gang Zhang
Zi-Hua Li
Chufeng Tang
Jianmin Li
Xiaolin Hu
24
16
0
13 Feb 2023
Reversible Vision Transformers
K. Mangalam
Haoqi Fan
Yanghao Li
Chaoxiong Wu
Bo Xiong
Christoph Feichtenhofer
Jitendra Malik
ViT
11
45
0
09 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
Chong Chen
Mu Li
ViT
58
144
0
06 Feb 2023
CECT: Controllable Ensemble CNN and Transformer for COVID-19 Image Classification
Zhao Liu
Leizhao Shen
ViT
29
8
0
05 Feb 2023
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Jiayu Jiao
Yuyao Tang
Kun-Li Channing Lin
Yipeng Gao
Jinhua Ma
Yaowei Wang
Wei-Shi Zheng
MedIm
ViT
29
136
0
03 Feb 2023
Image Super-Resolution using Efficient Striped Window Transformer
Jinpeng Shi
Hui Li
Tian Yu Liu
Yulong Liu
Hao Fei
Jinchen Zhu
Ling Zheng
Shizhuang Weng
42
10
0
24 Jan 2023
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
Haiyang Wang
Chen Shi
Shaoshuai Shi
Meng Lei
Sen Wang
Di He
Bernt Schiele
Liwei Wang
41
119
0
15 Jan 2023
HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection
Bin Tang
Zhengyi Liu
Yacheng Tan
Qian He
ViT
32
77
0
08 Jan 2023
FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection
Shuailei Ma
Yuefeng Wang
Shanze Wang
Ying-yu Wei
45
33
0
08 Jan 2023
Rethinking Mobile Block for Efficient Attention-based Models
Jiangning Zhang
Xiangtai Li
Jian Li
Liang Liu
Zhucun Xue
Boshen Zhang
Zhe Jiang
Tianxin Huang
Yabiao Wang
Chengjie Wang
MQ
44
91
0
03 Jan 2023
Representation Separation for Semantic Segmentation with Vision Transformers
Yuanduo Hong
Huihui Pan
Weichao Sun
Xinghu Yu
Huijun Gao
ViT
28
5
0
28 Dec 2022
SMMix: Self-Motivated Image Mixing for Vision Transformers
Yonghong Tian
Mingbao Lin
Zhihang Lin
Yuxin Zhang
Rongrong Ji
Rongrong Ji
55
10
0
26 Dec 2022
DQnet: Cross-Model Detail Querying for Camouflaged Object Detection
Wei Sun
Chengao Liu
Linyan Zhang
Yu Li
Pengxu Wei
Chang-rui Liu
J. Zou
Jianbin Jiao
QiXiang Ye
48
6
0
16 Dec 2022
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
35
161
0
15 Dec 2022
Most Important Person-guided Dual-branch Cross-Patch Attention for Group Affect Recognition
Hongxia Xie
Ming-Xian Lee
Tzu-Jui Chen
Hung-Jen Chen
Hou-I Liu
Hong-Han Shuai
Wen-Huang Cheng
CVBM
35
8
0
14 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
Xueliang Wang
ViT
38
21
0
13 Dec 2022
Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images
L. Ding
Jing Zhang
Kai Zhang
Haitao Guo
Bing Liu
Lorenzo Bruzzone
29
49
0
10 Dec 2022
Co-training
2
L
2^L
2
L
Submodels for Visual Recognition
Hugo Touvron
Matthieu Cord
Maxime Oquab
Piotr Bojanowski
Jakob Verbeek
Hervé Jégou
VLM
37
9
0
09 Dec 2022
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
VGen
32
87
0
08 Dec 2022
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion
Hanqing Zhao
Dianmo Sheng
Jianmin Bao
Dongdong Chen
Dong Chen
...
Ce Liu
Wenbo Zhou
Qi Chu
Weiming Zhang
Neng H. Yu
VLM
DiffM
38
39
0
07 Dec 2022
Window Normalization: Enhancing Point Cloud Understanding by Unifying Inconsistent Point Densities
Qi Wang
Shengge Shi
Jiahui Li
Wuming Jiang
Xiangde Zhang
28
9
0
05 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
27
33
0
01 Dec 2022
FsaNet: Frequency Self-attention for Semantic Segmentation
Fengyu Zhang
Ashkan Panahi
Guangjun Gao
AI4TS
32
28
0
28 Nov 2022
Exploring Consistency in Cross-Domain Transformer for Domain Adaptive Semantic Segmentation
Kaihong Wang
Donghyun Kim
Regerio Feris
Kate Saenko
Margrit Betke
ViT
27
4
0
27 Nov 2022
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
46
5
0
25 Nov 2022
UperFormer: A Multi-scale Transformer-based Decoder for Semantic Segmentation
Jing Xu
W. Shi
Pan Gao
Zhengwei Wang
Qizhu Li
ViT
8
1
0
25 Nov 2022
Cross Aggregation Transformer for Image Restoration
Zheng Chen
Yulun Zhang
Jinjin Gu
Yongbing Zhang
L. Kong
X. Yuan
ViT
33
142
0
24 Nov 2022
A Dual-scale Lead-seperated Transformer With Lead-orthogonal Attention And Meta-information For Ecg Classification
Heng Chang
Guijin Wang
Zhourui Xia
Wenming Yang
Li Sun
MedIm
37
1
0
23 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
39
6
0
23 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
34
129
0
22 Nov 2022
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
Haram Choi
Jeong-Sik Lee
Jihoon Yang
ViT
29
75
0
21 Nov 2022
Beyond the Field-of-View: Enhancing Scene Visibility and Perception with Clip-Recurrent Transformer
Haowen Shi
Zhijie Xu
Kailun Yang
Xiaoyue Yin
Ze Wang
Kaiwei Wang
ViT
43
5
0
21 Nov 2022
Vision Transformer with Super Token Sampling
Huaibo Huang
Xiaoqiang Zhou
Jie Cao
Ran He
Tieniu Tan
ViT
23
56
0
21 Nov 2022
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
Haoran You
Yunyang Xiong
Xiaoliang Dai
Bichen Wu
Peizhao Zhang
Haoqi Fan
Peter Vajda
Yingyan Lin
37
32
0
18 Nov 2022
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
Yulin Wang
Yang Yue
Rui Lu
Tian-De Liu
Zhaobai Zhong
S. Song
Gao Huang
37
28
0
17 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
30
107
0
17 Nov 2022
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer
Haokui Zhang
Wenze Hu
Xiaoyu Wang
ViT
19
8
0
14 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
32
6
0
14 Nov 2022
Token Transformer: Can class token help window-based transformer build better long-range interactions?
Jia-ju Mao
Yuan Chang
Xuesong Yin
34
0
0
11 Nov 2022
Demystify Transformers & Convolutions in Modern Image Deep Networks
Jifeng Dai
Min Shi
Weiyun Wang
Sitong Wu
Linjie Xing
...
Lewei Lu
Jie Zhou
Xiaogang Wang
Yu Qiao
Xiao-hua Hu
ViT
34
11
0
10 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
41
660
0
10 Nov 2022
ViT-LSLA: Vision Transformer with Light Self-Limited-Attention
Zhenzhe Hechen
Wei Huang
Yixin Zhao
ViT
38
6
0
31 Oct 2022
Grafting Vision Transformers
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
29
2
0
28 Oct 2022
SemFormer: Semantic Guided Activation Transformer for Weakly Supervised Semantic Segmentation
Junliang Chen
Xiaodong Zhao
Cheng Luo
Linlin Shen
ViT
29
3
0
26 Oct 2022
Previous
1
2
3
4
5
6
7
8
9
Next