Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.16302
Cited By
Rethinking Spatial Dimensions of Vision Transformers
30 March 2021
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rethinking Spatial Dimensions of Vision Transformers"
50 / 307 papers shown
Title
Parameter-Inverted Image Pyramid Networks
Xizhou Zhu
Xue Yang
Zhaokai Wang
Hao Li
Wenhan Dou
Junqi Ge
Lewei Lu
Yu Qiao
Jifeng Dai
47
0
0
06 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
48
6
0
06 Jun 2024
The Deep Latent Space Particle Filter for Real-Time Data Assimilation with Uncertainty Quantification
N. T. Mücke
Sander M. Bohté
C. Oosterlee
39
0
0
04 Jun 2024
Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling
Chunlin Qiu
Yiheng Duan
Lingchen Zhao
Qian Wang
AAML
37
2
0
25 May 2024
Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers
Bum Jun Kim
Sang Woo Kim
ViT
43
1
0
23 May 2024
Learning to Transform Dynamically for Better Adversarial Transferability
Rongyi Zhu
Zeliang Zhang
Susan Liang
Zhuo Liu
Chenliang Xu
AAML
39
14
0
23 May 2024
Evaluating Adversarial Robustness in the Spatial Frequency Domain
Keng-Hsin Liao
Chin-Yuan Yeh
Hsi-Wen Chen
Ming-Syan Chen
26
0
0
10 May 2024
Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing
Juanjuan Weng
Zhiming Luo
Shaozi Li
AAML
36
0
0
10 May 2024
Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability
Juanjuan Weng
Zhiming Luo
Shaozi Li
AAML
39
1
0
06 May 2024
U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers
Yuchuan Tian
Zhijun Tu
Hanting Chen
Jie Hu
Chao Xu
Yunhe Wang
38
16
0
04 May 2024
CA-Stream: Attention-based pooling for interpretable image recognition
Felipe Torres
Hanwei Zhang
R. Sicre
Stéphane Ayache
Yannis Avrithis
52
0
0
23 Apr 2024
Data-independent Module-aware Pruning for Hierarchical Vision Transformers
Yang He
Qiufeng Wang
ViT
50
3
0
21 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
47
1
0
18 Apr 2024
GhostNetV3: Exploring the Training Strategies for Compact Models
Zhenhua Liu
Zhiwei Hao
Kai Han
Yehui Tang
Yunhe Wang
32
16
0
17 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
36
24
0
02 Apr 2024
SpiralMLP: A Lightweight Vision MLP Architecture
Haojie Mu
Burhan Ul Tayyab
Nicholas Chua
43
0
0
31 Mar 2024
On Inherent Adversarial Robustness of Active Vision Systems
Amitangshu Mukherjee
Timur Ibrayev
Kaushik Roy
AAML
36
0
0
29 Mar 2024
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng
Xiaoyong Chen
Jiaming Liang
Huisi Wu
Guangzhong Cao
Yong Guo
AAML
39
3
0
29 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
51
7
0
28 Mar 2024
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
31
34
0
20 Mar 2024
When Training-Free NAS Meets Vision Transformer: A Neural Tangent Kernel Perspective
Qiqi Zhou
Yichen Zhu
ViT
16
1
0
15 Mar 2024
Group-Mix SAM: Lightweight Solution for Industrial Assembly Line Applications
Wu Liang
X.-G. Ma
36
0
0
15 Mar 2024
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
ViT
45
7
0
15 Mar 2024
Attention-aware Semantic Communications for Collaborative Inference
Jiwoong Im
Nayoung Kwon
Taewoo Park
Jiheon Woo
Jaeho Lee
Yongjune Kim
46
2
0
23 Feb 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
46
1
0
19 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
34
28
0
05 Feb 2024
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Mingshi Yan
Shikun Zhang
Songhang Huang
Fei Huang
VLM
29
0
0
11 Jan 2024
Fully Attentional Networks with Self-emerging Token Labeling
Bingyin Zhao
Zhiding Yu
Shiyi Lan
Yutao Cheng
A. Anandkumar
Yingjie Lao
Jose M. Alvarez
980
6
0
08 Jan 2024
Multi-Attention Fusion Drowsy Driving Detection Model
Shulei Qu
Zhenguo Gao
Sissi Xiaoxiao Wu
Yuanyuan Qiu
CVBM
23
1
0
28 Dec 2023
PanGu-
π
π
π
: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
67
15
0
27 Dec 2023
Adaptive Depth Networks with Skippable Sub-Paths
Woochul Kang
33
1
0
27 Dec 2023
AutoAugment Input Transformation for Highly Transferable Targeted Attacks
Haobo Lu
Xin Liu
Kun He
AAML
16
0
0
21 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao
Zhan Tong
K. Lin
Joya Chen
Mike Zheng Shou
35
0
0
04 Dec 2023
Rethinking Mixup for Improving the Adversarial Transferability
Xiaosen Wang
Zeyuan Yin
AAML
30
2
0
28 Nov 2023
FMViT: A multiple-frequency mixing Vision Transformer
Wei Tan
Yifeng Geng
Xuansong Xie
ViT
24
3
0
09 Nov 2023
SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers
Xiangyong Lu
Masanori Suganuma
Takayuki Okatani
38
10
0
07 Nov 2023
FusionViT: Hierarchical 3D Object Detection via LiDAR-Camera Vision Transformer Fusion
Xinhao Xiang
Jiawei Zhang
3DPC
ViT
39
1
0
07 Nov 2023
Medi-CAT: Contrastive Adversarial Training for Medical Image Classification
Pervaiz Iqbal Khan
Andreas Dengel
Sheraz Ahmed
MedIm
18
3
0
31 Oct 2023
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders
Srijan Das
Tanmay Jain
Dominick Reilly
P. Balaji
Soumyajit Karmakar
Shyam Marjit
Xiang Li
Abhijit Das
Michael S. Ryoo
39
16
0
31 Oct 2023
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
Karsten Roth
Lukas Thede
Almut Sophia Koepke
Oriol Vinyals
Olivier J. Hénaff
Zeynep Akata
AAML
24
11
0
26 Oct 2023
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers
Xuwei Xu
Sen Wang
Yudong Chen
Jiajun Liu
ViT
21
1
0
09 Oct 2023
LumiNet: The Bright Side of Perceptual Knowledge Distillation
Md. Ismail Hossain
M. M. L. Elahi
Sameera Ramasinghe
A. Cheraghian
Fuad Rahman
Nabeel Mohammed
Shafin Rahman
29
1
0
05 Oct 2023
PPT: Token Pruning and Pooling for Efficient Vision Transformers
Xinjian Wu
Fanhu Zeng
Xiudong Wang
Xinghao Chen
ViT
24
22
0
03 Oct 2023
Trading-off Mutual Information on Feature Aggregation for Face Recognition
Mohammad Akyash
Ali Zafari
Nasser M. Nasrabadi
ViT
25
1
0
22 Sep 2023
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer
Fudong Lin
Summer Crawford
Kaleb Guillot
Yihe Zhang
Yan Chen
...
Tri Setiyono
B. Tubana
Lu Peng
Magdy A. Bayoumi
N. Tzeng
42
20
0
16 Sep 2023
Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?
Bill Psomas
Ioannis Kakogeorgiou
Konstantinos Karantzalos
Yannis Avrithis
ViT
38
8
0
13 Sep 2023
Dynamic Spectrum Mixer for Visual Recognition
Zhiqiang Hu
Tao Yu
30
3
0
13 Sep 2023
Toward a Deeper Understanding: RetNet Viewed through Convolution
Chenghao Li
Chaoning Zhang
ViT
35
7
0
11 Sep 2023
ExMobileViT: Lightweight Classifier Extension for Mobile Vision Transformer
Gyeongdong Yang
Yungwook Kwon
Hyunjin Kim
ViT
27
1
0
04 Sep 2023
Towards a Rigorous Analysis of Mutual Information in Contrastive Learning
Kyungeun Lee
Jaeill Kim
Suhyun Kang
Wonjong Rhee
SSL
33
2
0
30 Aug 2023
Previous
1
2
3
4
5
6
7
Next