Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00652
Cited By
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
1 July 2021
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows"
50 / 440 papers shown
Title
CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation
Guoyang Zhao
Fulong Ma
Weiqing Qi
Yuxuan Liu
Ming Liu
Jun Ma
46
5
0
25 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
38
2
0
24 Mar 2024
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
59
51
0
22 Mar 2024
ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding
Novendra Setyawan
Ghufron Wahyu Kurniawan
Chi-Chia Sun
Jun-Wei Hsieh
Hui-Kai Su
W. Kuo
ViT
MoE
47
0
0
22 Mar 2024
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
Ting Yao
Yehao Li
Yingwei Pan
Tao Mei
ViT
36
15
0
18 Mar 2024
TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions
Hui Lu
A. A. Salah
Ronald Poppe
SLR
35
5
0
18 Mar 2024
Neural Markov Random Field for Stereo Matching
Tongfan Guan
Chen Wang
Yunchun Liu
3DV
21
22
0
17 Mar 2024
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
ViT
45
8
0
15 Mar 2024
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim
Kyungdon Joo
3DPC
3DV
62
1
0
08 Mar 2024
HyenaPixel: Global Image Context with Convolutions
Julian Spravil
Sebastian Houben
Sven Behnke
31
1
0
29 Feb 2024
Interactive Multi-Head Self-Attention with Linear Complexity
Hankyul Kang
Ming-Hsuan Yang
Jongbin Ryu
26
1
0
27 Feb 2024
Multi-Human Mesh Recovery with Transformers
Zeyu Wang
Zhenzhen Weng
Serena Yeung-Levy
3DH
32
1
0
26 Feb 2024
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
95
57
0
08 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
105
24
0
08 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
41
29
0
05 Feb 2024
TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection
Tianxiang Chen
Zhentao Tan
Qi Chu
Yue-bo Wu
Bin Liu
Nenghai Yu
48
13
0
03 Feb 2024
LIR: A Lightweight Baseline for Image Restoration
Dongqi Fan
Ting Yue
Xin Zhao
Renjing Xu
Liang Chang
33
0
0
02 Feb 2024
Generating Multi-Center Classifier via Conditional Gaussian Distribution
Zhemin Zhang
Xun Gong
31
0
0
29 Jan 2024
Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention
Quang-Trung Truong
Duc Thanh Nguyen
Binh-Son Hua
Sai-Kit Yeung
VOS
36
1
0
25 Jan 2024
VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition
Xianfu Cheng
Weixiao Zhou
Xiang Li
Xiaoming Chen
Jian Yang
Tongliang Li
Zhoujun Li
37
2
0
18 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
32
5
0
18 Jan 2024
SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of Lumbar Spine MRI
Jiasong Chen
Linchen Qian
Linhai Ma
Timur Urakov
Weiyong Gu
Liang Liang
MedIm
39
4
0
17 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
50
719
0
17 Jan 2024
Fully Attentional Networks with Self-emerging Token Labeling
Bingyin Zhao
Zhiding Yu
Shiyi Lan
Yutao Cheng
A. Anandkumar
Yingjie Lao
Jose M. Alvarez
986
6
0
08 Jan 2024
SeTformer is What You Need for Vision and Language
Pourya Shamsolmoali
Masoumeh Zareapoor
Eric Granger
Michael Felsberg
49
4
0
07 Jan 2024
BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation
Libin Lan
Pengzhou Cai
Lu Jiang
Xiaojuan Liu
Yongmei Li
Yudong Zhang
ViT
MedIm
29
6
0
01 Jan 2024
PanGu-
π
π
π
: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
73
17
0
27 Dec 2023
Deformable Audio Transformer for Audio Event Detection
Wentao Zhu
28
0
0
24 Dec 2023
ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding
Lunhao Duan
Shanshan Zhao
Nan Xue
Biwei Huang
Gui-Song Xia
Dacheng Tao
ViT
37
18
0
18 Dec 2023
Agent Attention: On the Integration of Softmax and Linear Attention
Dongchen Han
Tianzhu Ye
Yizeng Han
Zhuofan Xia
Siyuan Pan
Pengfei Wan
Shiji Song
Gao Huang
45
75
0
14 Dec 2023
Auto-Prox: Training-Free Vision Transformer Architecture Search via Automatic Proxy Discovery
Zimian Wei
Lujun Li
Peijie Dong
Zheng Hui
Anggeng Li
Menglong Lu
H. Pan
Zhiliang Tian
Dongsheng Li
ViT
50
16
0
14 Dec 2023
MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation
Abdullah Rashwan
Jiageng Zhang
A. Taalimi
Fan Yang
Xingyi Zhou
Chaochao Yan
Liang-Chieh Chen
Yeqing Li
ViT
31
5
0
11 Dec 2023
Transformer-based Selective Super-Resolution for Efficient Image Refinement
Tianyi Zhang
Kishore Kasichainula
Yaoxin Zhuo
Baoxin Li
Jae-sun Seo
Yu Cao
28
7
0
10 Dec 2023
The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need
Tianjin Huang
Tianlong Chen
Zhangyang Wang
Shiwei Liu
34
1
0
09 Dec 2023
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang
Jieru Mei
Alan Yuille
VLM
35
55
0
04 Dec 2023
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Minchul Kim
Shangqian Gao
Yen-Chang Hsu
Yilin Shen
Hongxia Jin
31
32
0
02 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
49
0
0
01 Dec 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
23
1
0
29 Nov 2023
PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution
Zuoyan Zhao
Hui Xue
Pengfei Fang
Shipeng Zhu
DiffM
26
4
0
29 Nov 2023
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
ViT
23
77
0
28 Nov 2023
Cross-level Attention with Overlapped Windows for Camouflaged Object Detection
Jiepan Li
Fangxiao Lu
Nan Xue
Zhuo Li
Hongyan Zhang
Wei He
43
2
0
28 Nov 2023
Advancing Vision Transformers with Group-Mix Attention
Chongjian Ge
Xiaohan Ding
Zhan Tong
Li Yuan
Jiangliu Wang
Yibing Song
Ping Luo
112
16
0
26 Nov 2023
Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices
Gaoxiang Duan
Junkai Zhang
Xiaoying Zheng
Yongxin Zhu
36
2
0
22 Nov 2023
Deep Tensor Network
Yifan Zhang
32
0
0
18 Nov 2023
Vision Big Bird: Random Sparsification for Full Attention
Zhemin Zhang
Xun Gong
ViT
21
1
0
10 Nov 2023
Window Attention is Bugged: How not to Interpolate Position Embeddings
Daniel Bolya
Chaitanya K. Ryali
Judy Hoffman
Christoph Feichtenhofer
48
10
0
09 Nov 2023
GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks
Zhonghang Li
Lianghao Xia
Yong-mei Xu
Chao Huang
AI4TS
AI4CE
93
25
0
07 Nov 2023
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
Xuwei Xu
Sen Wang
Yudong Chen
Yanping Zheng
Zhewei Wei
Jiajun Liu
ViT
30
9
0
06 Nov 2023
Scattering Vision Transformer: Spectral Mixing Matters
Badri N. Patro
Vijay Srinivas Agneeswaran
41
14
0
02 Nov 2023
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
Meng Lou
Hong-Yu Zhou
Sibei Yang
Yizhou Yu
Chuan Wu
Yizhou Yu
ViT
49
36
0
30 Oct 2023
Previous
1
2
3
4
5
6
7
8
9
Next