ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.16302
  4. Cited By
Rethinking Spatial Dimensions of Vision Transformers

Rethinking Spatial Dimensions of Vision Transformers

30 March 2021
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
    ViT
ArXivPDFHTML

Papers citing "Rethinking Spatial Dimensions of Vision Transformers"

50 / 307 papers shown
Title
EfficientFormer: Vision Transformers at MobileNet Speed
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
23
347
0
02 Jun 2022
Surface Analysis with Vision Transformers
Surface Analysis with Vision Transformers
Simon Dahan
Logan Z. J. Williams
Abdulah Fawaz
Daniel Rueckert
E. C. Robinson
ViT
MedIm
29
2
0
31 May 2022
Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on
  Small Datasets
Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small Datasets
Leandro M. de Lima
R. Krohling
ViT
MedIm
28
10
0
30 May 2022
A Closer Look at Self-Supervised Lightweight Vision Transformers
A Closer Look at Self-Supervised Lightweight Vision Transformers
Shaoru Wang
Jin Gao
Zeming Li
Jian Sun
Weiming Hu
ViT
67
41
0
28 May 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
20
20
0
28 May 2022
Scalable and Efficient Training of Large Convolutional Neural Networks
  with Differential Privacy
Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy
Zhiqi Bu
J. Mao
Shiyun Xu
136
47
0
21 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision
  Transformers with Locality
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
113
73
0
20 May 2022
TRT-ViT: TensorRT-oriented Vision Transformer
TRT-ViT: TensorRT-oriented Vision Transformer
Xin Xia
Jiashi Li
Jie Wu
Xing Wang
Xuefeng Xiao
Min Zheng
Rui Wang
ViT
23
27
0
19 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
  Semantic Segmentation and Localization
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
28
159
0
16 May 2022
Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel
  Transformer
Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer
Wu Yun
Mengshi Qi
Chuanming Wang
Huiyuan Fu
Huadong Ma
ViT
11
6
0
30 Apr 2022
DearKD: Data-Efficient Early Knowledge Distillation for Vision
  Transformers
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Xianing Chen
Qiong Cao
Yujie Zhong
Jing Zhang
Shenghua Gao
Dacheng Tao
ViT
37
76
0
27 Apr 2022
Understanding The Robustness in Vision Transformers
Understanding The Robustness in Vision Transformers
Daquan Zhou
Zhiding Yu
Enze Xie
Chaowei Xiao
Anima Anandkumar
Jiashi Feng
J. Álvarez
ViT
22
185
0
26 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
21
4
0
26 Apr 2022
Visual Attention Emerges from Recurrent Sparse Reconstruction
Visual Attention Emerges from Recurrent Sparse Reconstruction
Baifeng Shi
Ya-heng Song
Neel Joshi
Trevor Darrell
Xin Wang
3DH
14
6
0
23 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
VSA: Learning Varied-Size Window Attention in Vision Transformers
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
22
53
0
18 Apr 2022
An Extendable, Efficient and Effective Transformer-based Object Detector
An Extendable, Efficient and Effective Transformer-based Object Detector
Hwanjun Song
Deqing Sun
Sanghyuk Chun
Varun Jampani
Dongyoon Han
Byeongho Heo
Wonjae Kim
Ming-Hsuan Yang
19
13
0
17 Apr 2022
DeiT III: Revenge of the ViT
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
42
389
0
14 Apr 2022
Residual Swin Transformer Channel Attention Network for Image
  Demosaicing
Residual Swin Transformer Channel Attention Network for Image Demosaicing
W. Xing
K. Egiazarian
ViT
19
14
0
14 Apr 2022
DaViT: Dual Attention Vision Transformers
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
42
240
0
07 Apr 2022
Learning Local and Global Temporal Contexts for Video Semantic
  Segmentation
Learning Local and Global Temporal Contexts for Video Semantic Segmentation
Guolei Sun
Yun Liu
Henghui Ding
Min Wu
Luc Van Gool
30
32
0
07 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for
  Object Detection
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
31
55
0
06 Apr 2022
Improving Vision Transformers by Revisiting High-frequency Components
Improving Vision Transformers by Revisiting High-frequency Components
Jiawang Bai
Liuliang Yuan
Shutao Xia
Shuicheng Yan
Zhifeng Li
Wei Liu
ViT
16
90
0
03 Apr 2022
Exploring Plain Vision Transformer Backbones for Object Detection
Exploring Plain Vision Transformer Backbones for Object Detection
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
ViT
33
775
0
30 Mar 2022
ITTR: Unpaired Image-to-Image Translation with Transformers
ITTR: Unpaired Image-to-Image Translation with Transformers
Wanfeng Zheng
Qiang Li
Guoxin Zhang
Pengfei Wan
Zhong-ming Wang
ViT
40
17
0
30 Mar 2022
CNN Filter DB: An Empirical Investigation of Trained Convolutional
  Filters
CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters
Paul Gavrikov
J. Keuper
AAML
21
31
0
29 Mar 2022
ObjectFormer for Image Manipulation Detection and Localization
ObjectFormer for Image Manipulation Detection and Localization
Junke Wang
Zuxuan Wu
Jingjing Chen
Xintong Han
Abhinav Shrivastava
Ser-Nam Lim
Yu-Gang Jiang
28
108
0
28 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
19
14
0
27 Mar 2022
Training-free Transformer Architecture Search
Training-free Transformer Architecture Search
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
32
46
0
23 Mar 2022
Attribute Surrogates Learning and Spectral Tokens Pooling in
  Transformers for Few-shot Learning
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
Yang He
Weihan Liang
Dongyang Zhao
Hong-Yu Zhou
Weifeng Ge
Yizhou Yu
Wenqiang Zhang
ViT
27
45
0
17 Mar 2022
HUMUS-Net: Hybrid unrolled multi-scale network architecture for
  accelerated MRI reconstruction
HUMUS-Net: Hybrid unrolled multi-scale network architecture for accelerated MRI reconstruction
Zalan Fabian
Berk Tinaz
Mahdi Soltanolkotabi
30
50
0
15 Mar 2022
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets
  and Transformer
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer
Haokui Zhang
Wenze Hu
Xiaoyu Wang
ViT
41
59
0
08 Mar 2022
Dynamic Group Transformer: A General Vision Transformer Backbone with
  Dynamic Group Attention
Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention
Kai Liu
Tianyi Wu
Cong Liu
Guodong Guo
ViT
41
17
0
08 Mar 2022
CF-ViT: A General Coarse-to-Fine Method for Vision Transformer
CF-ViT: A General Coarse-to-Fine Method for Vision Transformer
Mengzhao Chen
Mingbao Lin
Ke Li
Yunhang Shen
Yongjian Wu
Rongrong Ji
Rongrong Ji
ViT
43
60
0
08 Mar 2022
Multi-Tailed Vision Transformer for Efficient Inference
Multi-Tailed Vision Transformer for Efficient Inference
Yunke Wang
Bo Du
Wenyuan Wang
Chang Xu
ViT
213
6
0
03 Mar 2022
Person Re-identification: A Retrospective on Domain Specific Open
  Challenges and Future Trends
Person Re-identification: A Retrospective on Domain Specific Open Challenges and Future Trends
Asma Zahra
N. Perwaiz
Muhammad Shahzad
M. Fraz
114
60
0
26 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
  Image Recognition and Beyond
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
33
229
0
21 Feb 2022
Not All Patches are What You Need: Expediting Vision Transformers via
  Token Reorganizations
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations
Youwei Liang
Chongjian Ge
Zhan Tong
Yibing Song
Jue Wang
P. Xie
ViT
25
236
0
16 Feb 2022
How Do Vision Transformers Work?
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
41
465
0
14 Feb 2022
Learning Features with Parameter-Free Layers
Learning Features with Parameter-Free Layers
Dongyoon Han
Y. Yoo
Beomyoung Kim
Byeongho Heo
35
8
0
06 Feb 2022
BOAT: Bilateral Local Attention Vision Transformer
BOAT: Bilateral Local Attention Vision Transformer
Tan Yu
Gangming Zhao
Ping Li
Yizhou Yu
ViT
33
27
0
31 Jan 2022
Plug-In Inversion: Model-Agnostic Inversion for Vision with Data
  Augmentations
Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations
Amin Ghiasi
Hamid Kazemi
Steven Reich
Chen Zhu
Micah Goldblum
Tom Goldstein
48
15
0
31 Jan 2022
Generalised Image Outpainting with U-Transformer
Generalised Image Outpainting with U-Transformer
Penglei Gao
Xi Yang
Rui Zhang
John Y. Goulermas
Yujie Geng
Yuyao Yan
Kaizhu Huang
ViT
19
17
0
27 Jan 2022
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit
  Vision Transformer
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer
Mengshu Sun
Haoyu Ma
Guoliang Kang
Yi Ding
Tianlong Chen
Xiaolong Ma
Zhangyang Wang
Yanzhi Wang
ViT
33
45
0
17 Jan 2022
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
Zhenglun Kong
Peiyan Dong
Xiaolong Ma
Xin Meng
Mengshu Sun
...
Geng Yuan
Bin Ren
Minghai Qin
H. Tang
Yanzhi Wang
ViT
34
144
0
27 Dec 2021
Augmenting Convolutional networks with attention-based aggregation
Augmenting Convolutional networks with attention-based aggregation
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Piotr Bojanowski
Armand Joulin
Gabriel Synnaeve
Hervé Jégou
ViT
38
47
0
27 Dec 2021
Vision Transformer for Small-Size Datasets
Vision Transformer for Small-Size Datasets
Seung Hoon Lee
Seunghyun Lee
B. Song
ViT
22
222
0
27 Dec 2021
MIA-Former: Efficient and Robust Vision Transformers via Multi-grained
  Input-Adaptation
MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation
Zhongzhi Yu
Y. Fu
Sicheng Li
Chaojian Li
Yingyan Lin
ViT
33
19
0
21 Dec 2021
A Simple Single-Scale Vision Transformer for Object Localization and
  Instance Segmentation
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Wuyang Chen
Xianzhi Du
Fan Yang
Lucas Beyer
Xiaohua Zhai
...
Huizhong Chen
Jing Li
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
29
20
0
17 Dec 2021
Joint Global and Local Hierarchical Priors for Learned Image Compression
Joint Global and Local Hierarchical Priors for Learned Image Compression
Jun-Hyuk Kim
Byeongho Heo
Jong-Seok Lee
31
82
0
08 Dec 2021
Ablation study of self-supervised learning for image classification
Ablation study of self-supervised learning for image classification
Ilias Papastratis
SSL
12
1
0
04 Dec 2021
Previous
1234567
Next