Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.12122
Cited By
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
24 February 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions"
50 / 604 papers shown
Title
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
34
129
0
22 Nov 2022
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Sifan Long
Z. Zhao
Jimin Pi
Sheng-sheng Wang
Jingdong Wang
22
29
0
21 Nov 2022
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Zhenglun Kong
Haoyu Ma
Geng Yuan
Mengshu Sun
Yanyue Xie
...
Tianlong Chen
Xiaolong Ma
Xiaohui Xie
Zhangyang Wang
Yanzhi Wang
ViT
34
22
0
19 Nov 2022
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Maoyuan Ye
Jing Zhang
Shanshan Zhao
Juhua Liu
Tongliang Liu
Bo Du
Dacheng Tao
41
71
0
19 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
30
107
0
17 Nov 2022
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
Peiyan Dong
Mengshu Sun
Alec Lu
Yanyue Xie
Li-Yu Daisy Liu
...
Xin Meng
ZeLin Li
Xue Lin
Zhenman Fang
Yanzhi Wang
ViT
34
59
0
15 Nov 2022
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer
Haokui Zhang
Wenze Hu
Xiaoyu Wang
ViT
19
8
0
14 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
30
6
0
14 Nov 2022
BiViT: Extremely Compressed Binary Vision Transformer
Yefei He
Zhenyu Lou
Luoming Zhang
Jing Liu
Weijia Wu
Hong Zhou
Bohan Zhuang
ViT
MQ
20
28
0
14 Nov 2022
AU-Aware Vision Transformers for Biased Facial Expression Recognition
Shuyi Mao
Xinpeng Li
Q. Wu
Xiaojiang Peng
ViT
36
2
0
12 Nov 2022
Interactive Context-Aware Network for RGB-T Salient Object Detection
Yuxuan Wang
Feng Dong
Jinchao Zhu
24
0
0
11 Nov 2022
Token Transformer: Can class token help window-based transformer build better long-range interactions?
Jia-ju Mao
Yuan Chang
Xuesong Yin
31
0
0
11 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
36
657
0
10 Nov 2022
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU
Saghar Irandoust
Thibaut Durand
Yunduz Rakhmangulova
Wenjie Zi
Hossein Hajimirsadeghi
ViT
33
6
0
09 Nov 2022
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
Jyotikrishna Dass
Shang Wu
Huihong Shi
Chaojian Li
Zhifan Ye
Zhongfeng Wang
Yingyan Lin
17
49
0
09 Nov 2022
Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer
S. S. Nijhawan
Leo Hoshikawa
Atsushi Irie
Masakazu Yoshimura
Junji Otsuka
Takeshi Ohashi
VOT
ViT
29
0
0
09 Nov 2022
DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-Based Segmentation Networks
F. Barbato
Giulia Rizzoli
Pietro Zanuttigh
MDE
ViT
28
4
0
08 Nov 2022
ViT-CX: Causal Explanation of Vision Transformers
Weiyan Xie
Xiao-hui Li
Caleb Chen Cao
Nevin L.Zhang
ViT
29
17
0
06 Nov 2022
Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images
Yan Zhang
Xiyuan Gao
Qingyan Duan
Jiaxu Leng
Xiao Pu
Xinbo Gao
ViT
16
1
0
28 Oct 2022
Grafting Vision Transformers
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
29
2
0
28 Oct 2022
Masked Vision-Language Transformer in Fashion
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Daniel Gehrig
Luc Van Gool
21
25
0
27 Oct 2022
M
3
^3
3
ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
42
81
0
26 Oct 2022
SemFormer: Semantic Guided Activation Transformer for Weakly Supervised Semantic Segmentation
Junliang Chen
Xiaodong Zhao
Cheng Luo
Linlin Shen
ViT
27
3
0
26 Oct 2022
TPFNet: A Novel Text In-painting Transformer for Text Removal
Onkar Susladkar
Dhruv Makwana
Gayatri S Deshmukh
Sparsh Mittal
R. S. Teja
Rekha Singhal
ViT
14
3
0
26 Oct 2022
Adversarially Robust Medical Classification via Attentive Convolutional Neural Networks
I. Wasserman
OOD
MedIm
AAML
29
0
0
26 Oct 2022
Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets
Xiangyu Chen
Ying Qin
Wenju Xu
A. Bur
Cuncong Zhong
Guanghui Wang
ViT
46
3
0
25 Oct 2022
End-to-end Transformer for Compressed Video Quality Enhancement
Li Yu
Wenshuai Chang
Shiyu Wu
Moncef Gabbouj
ViT
24
8
0
25 Oct 2022
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
40
156
0
24 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Lu Zhou
Lei Wang
Zaiyan Dai
Jun Yang
ViT
34
23
0
22 Oct 2022
Face Pyramid Vision Transformer
Khawar Islam
M. Zaheer
Arif Mahmood
ViT
CVBM
24
4
0
21 Oct 2022
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling
Dongsheng Chen
Chaofan Tao
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
VLM
29
18
0
21 Oct 2022
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Haoran You
Zhanyi Sun
Huihong Shi
Zhongzhi Yu
Yang Katie Zhao
Yongan Zhang
Chaojian Li
Baopu Li
Yingyan Lin
ViT
25
76
0
18 Oct 2022
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
Hyeong Kyu Choi
Joonmyung Choi
Hyunwoo J. Kim
ViT
28
35
0
14 Oct 2022
Token-Label Alignment for Vision Transformers
Han Xiao
Wenzhao Zheng
Zhengbiao Zhu
Jie Zhou
Jiwen Lu
21
4
0
12 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
25
57
0
12 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
27
17
0
11 Oct 2022
Curved Representation Space of Vision Transformers
Juyeop Kim
Junha Park
Songkuk Kim
Jongseok Lee
ViT
38
6
0
11 Oct 2022
LAPFormer: A Light and Accurate Polyp Segmentation Transformer
Mai Nguyen
Tu Bui
Quan Nguyen
T. Nguyen
Toan Van Pham
ViT
3DV
MedIm
103
3
0
10 Oct 2022
Coded Residual Transform for Generalizable Deep Metric Learning
Shichao Kan
Yixiong Liang
Min Li
Yigang Cen
Jianxin Wang
Z. He
34
3
0
09 Oct 2022
Rethinking the Detection Head Configuration for Traffic Object Detection
Yi Shi
Jiang Wu
Shixuan Zhao
Gangyao Gao
T. Deng
Hongmei Yan
ObjD
24
5
0
08 Oct 2022
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
Yen-Cheng Liu
Chih-Yao Ma
Junjiao Tian
Zijian He
Z. Kira
126
47
0
07 Oct 2022
Centralized Feature Pyramid for Object Detection
Yu Quan
Dong Zhang
Liyan Zhang
Jinhui Tang
ObjD
31
148
0
05 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
39
58
0
04 Oct 2022
Implicit Warping for Animation with Image Sets
Arun Mallya
Ting-Chun Wang
Xuan Li
VGen
119
41
0
04 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
32
25
0
03 Oct 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
103
87
0
30 Sep 2022
Dilated Neighborhood Attention Transformer
Ali Hassani
Humphrey Shi
ViT
MedIm
33
68
0
29 Sep 2022
IoU-Enhanced Attention for End-to-End Task Specific Object Detection
Jing Zhao
Shengjian Wu
Li Sun
Qingli Li
33
6
0
21 Sep 2022
Dynamic Graph Message Passing Networks for Visual Recognition
Li Zhang
Mohan Chen
Anurag Arnab
Xiangyang Xue
Philip Torr
GNN
29
1
0
20 Sep 2022
Graph Reasoning Transformer for Image Parsing
Dong Zhang
Jinhui Tang
Kwang-Ting Cheng
ViT
24
16
0
20 Sep 2022
Previous
1
2
3
...
5
6
7
...
11
12
13
Next