Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.02034
Cited By
v1
v2 (latest)
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
3 June 2021
Yongming Rao
Wenliang Zhao
Benlin Liu
Jiwen Lu
Jie Zhou
Cho-Jui Hsieh
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (608★)
Papers citing
"DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification"
50 / 444 papers shown
Title
Data Level Lottery Ticket Hypothesis for Vision Transformers
Xuan Shen
Zhenglun Kong
Minghai Qin
Peiyan Dong
Geng Yuan
Xin Meng
Hao Tang
Xiaolong Ma
Yanzhi Wang
87
6
0
02 Nov 2022
ProContEXT: Exploring Progressive Context Transformer for Tracking
Jinpeng Lan
Zhi-Qi Cheng
Ju He
Chenyang Li
Bin Luo
Xueting Bao
Wangmeng Xiang
Yifeng Geng
Xuansong Xie
102
31
0
27 Oct 2022
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
132
474
0
17 Oct 2022
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
107
5
0
15 Oct 2022
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
Denis Kuznedelev
Eldar Kurtic
Elias Frantar
Dan Alistarh
VLM
ViT
65
13
0
14 Oct 2022
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
Hyeong Kyu Choi
Joonmyung Choi
Hyunwoo J. Kim
ViT
85
37
0
14 Oct 2022
Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
Yanjing Li
Sheng Xu
Baochang Zhang
Xianbin Cao
Penglei Gao
Guodong Guo
MQ
ViT
103
95
0
13 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
41
19
0
11 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
114
31
0
03 Oct 2022
Learning Hierarchical Image Segmentation For Recognition and By Recognition
Tsung-Wei Ke
Sangwoo Mo
Stella X. Yu
VLM
136
11
0
01 Oct 2022
Effective Vision Transformer Training: A Data-Centric Perspective
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
69
5
0
29 Sep 2022
Dilated Neighborhood Attention Transformer
Ali Hassani
Humphrey Shi
ViT
MedIm
112
73
0
29 Sep 2022
Spikformer: When Spiking Neural Network Meets Transformer
Zhaokun Zhou
Yuesheng Zhu
Chao He
Yaowei Wang
Shuicheng Yan
Yonghong Tian
Liuliang Yuan
227
264
0
29 Sep 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
95
28
0
28 Sep 2022
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention
Xiangcheng Liu
Tianyi Wu
Guodong Guo
ViT
129
31
0
28 Sep 2022
Attacking Compressed Vision Transformers
Swapnil Parekh
Devansh Shah
Pratyush Shukla
AAML
48
1
0
28 Sep 2022
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation
Haoyu Ma
Zhe Wang
Yifei Chen
Deying Kong
Liangjian Chen
Xingwei Liu
Xiangyi Yan
Hao Tang
Xiaohui Xie
ViT
87
48
0
16 Sep 2022
Hydra Attention: Efficient Attention with Many Heads
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Judy Hoffman
145
80
0
15 Sep 2022
Accelerating Vision Transformer Training via a Patch Sampling Schedule
Bradley McDanel
C. Huynh
ViT
74
1
0
19 Aug 2022
PatchDropout: Economizing Vision Transformers Using Patch Dropout
Yue Liu
Christos Matsoukas
Fredrik Strand
Hossein Azizpour
Kevin Smith
64
24
0
10 Aug 2022
Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers
Junhyeong Cho
Youwang Kim
Tae-Hyun Oh
ViT
104
123
0
27 Jul 2022
Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation
Jiaming Zhang
Kailun Yang
Haowen Shi
Simon Reiß
Kunyu Peng
Chaoxiang Ma
Haodong Fu
Philip H. S. Torr
Kaiwei Wang
Rainer Stiefelhagen
ViT
MDE
103
39
0
25 Jul 2022
Towards Efficient Adversarial Training on Vision Transformers
Boxi Wu
Jindong Gu
Zhifeng Li
Deng Cai
Xiaofei He
Wei Liu
ViT
AAML
87
40
0
21 Jul 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
95
121
0
16 Jul 2022
Should All Proposals be Treated Equally in Object Detection?
Yunsheng Li
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Pei Yu
Jing Yin
Lu Yuan
Zicheng Liu
Nuno Vasconcelos
ObjD
36
4
0
07 Jul 2022
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
Runsheng Xu
Zhengzhong Tu
Hao Xiang
Wei Shao
Bolei Zhou
Jiaqi Ma
142
227
0
05 Jul 2022
Efficient Representation Learning via Adaptive Context Pooling
Chen Huang
Walter A. Talbott
Navdeep Jaitly
J. Susskind
49
6
0
05 Jul 2022
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks
Yongming Rao
Zuyan Liu
Wenliang Zhao
Jie Zhou
Jiwen Lu
ViT
86
38
0
04 Jul 2022
The Lighter The Better: Rethinking Transformers in Medical Image Segmentation Through Adaptive Pruning
Xian Lin
Li Yu
Kwang-Ting Cheng
Zengqiang Yan
ViT
MedIm
62
35
0
29 Jun 2022
Toward Unpaired Multi-modal Medical Image Segmentation via Learning Structured Semantic Consistency
Jie Yang
Ruimao Zhang
Chao Wang
Zhuguo Li
Lingyan Zhang
59
11
0
21 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
92
26
0
17 Jun 2022
Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network
Abhishek Srivastava
Nikhil Kumar Tomar
Ulas Bagci
Debesh Jha
MedIm
57
15
0
16 Jun 2022
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
102
57
0
14 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
105
265
0
06 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
135
371
0
02 Jun 2022
Dynamic Linear Transformer for 3D Biomedical Image Segmentation
Zheyu Zhang
Ulas Bagci
ViT
MedIm
87
12
0
01 Jun 2022
CompleteDT: Point Cloud Completion with Dense Augment Inference Transformers
Jun Li
Shangwei Guo
Shaokun Han
ViT
40
3
0
30 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
125
72
0
26 May 2022
Super Vision Transformer
Mingbao Lin
Mengzhao Chen
Yuxin Zhang
Yunhang Shen
Rongrong Ji
Liujuan Cao
ViT
125
21
0
23 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
Łukasz Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
94
196
0
06 May 2022
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
Shuai Zhao
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
CLIP
74
121
0
02 May 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
104
6
0
26 Apr 2022
Multimodal Token Fusion for Vision Transformers
Yikai Wang
Xinghao Chen
Lele Cao
Wen-bing Huang
Gang Hua
Yunhe Wang
ViT
100
183
0
19 Apr 2022
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
Wang Zeng
Sheng Jin
Wentao Liu
Chao Qian
Ping Luo
Ouyang Wanli
Xiaogang Wang
ViT
103
127
0
19 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
88
127
0
14 Apr 2022
MaxViT: Multi-Axis Vision Transformer
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
140
674
0
04 Apr 2022
Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes
Samrudhdhi B. Rangrej
C. Srinidhi
J. Clark
67
12
0
01 Apr 2022
Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework
Botao Ye
Hong Chang
Bingpeng Ma
Shiguang Shan
Xilin Chen
ViT
111
485
0
22 Mar 2022
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
109
279
0
22 Mar 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
160
56
0
21 Mar 2022
Previous
1
2
3
4
5
6
7
8
9
Next