Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.12122
Cited By
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
24 February 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions"
50 / 604 papers shown
Title
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Hao Wu
Sijia Liu
Pin-Yu Chen
ViT
MLT
37
57
0
12 Feb 2023
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
Cheng Chen
Mu Li
ViT
58
144
0
06 Feb 2023
AMD-HookNet for Glacier Front Segmentation
Fei Wu
Nora Gourmelon
T. Seehaus
Jianlin Zhang
M. Braun
Andreas Maier
Vincent Christlein
24
9
0
06 Feb 2023
Out of Distribution Performance of State of Art Vision Model
Salman Rahman
W. Lee
37
2
0
25 Jan 2023
Dynamic Grained Encoder for Vision Transformers
Lin Song
Songyang Zhang
Songtao Liu
Zeming Li
Xuming He
Hongbin Sun
Jian Sun
Nanning Zheng
ViT
26
34
0
10 Jan 2023
DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction
Yang Yang
Yibo Yang
L. Zhang
ViT
33
51
0
09 Jan 2023
RGB-T Multi-Modal Crowd Counting Based on Transformer
Zhengyi Liu
Wei Wu
liuzywen
Guanghui Zhang
ViT
18
11
0
08 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
18
25
0
05 Jan 2023
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
Sucheng Ren
Fangyun Wei
Zheng-Wei Zhang
Han Hu
40
34
0
03 Jan 2023
A New Perspective to Boost Vision Transformer for Medical Image Classification
Yuexiang Li
Yawen Huang
Nanjun He
Kai Ma
Yefeng Zheng
ViT
MedIm
21
3
0
03 Jan 2023
Edge Enhanced Image Style Transfer via Transformers
Chi Zhang
Jun Yang
Zaiyan Dai
Peng-Xia Cao
16
10
0
02 Jan 2023
Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification
Ziyi Tang
Ruimao Zhang
Zhanglin Peng
Jinrui Chen
Liang Lin
33
18
0
02 Jan 2023
Pseudo-Inverted Bottleneck Convolution for DARTS Search Space
Arash Ahmadian
Louis S.P. Liu
Yue Fei
Konstantinos N. Plataniotis
Mahdi S. Hosseini
21
0
0
31 Dec 2022
Local Learning on Transformers via Feature Reconstruction
P. Pathak
Jingwei Zhang
Dimitris Samaras
ViT
24
5
0
29 Dec 2022
Exploring Vision Transformers as Diffusion Learners
He Cao
Jianan Wang
Tianhe Ren
Xianbiao Qi
Yihao Chen
Yuan Yao
L. Zhang
44
10
0
28 Dec 2022
Representation Separation for Semantic Segmentation with Vision Transformers
Yuanduo Hong
Huihui Pan
Weichao Sun
Xinghu Yu
Huijun Gao
ViT
28
5
0
28 Dec 2022
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
Wei Ji
Long Chen
Yin-wei Wei
Yiming Wu
Tat-Seng Chua
AI4TS
29
18
0
26 Dec 2022
SMMix: Self-Motivated Image Mixing for Vision Transformers
Mengzhao Chen
Mingbao Lin
Zhihang Lin
Yu-xin Zhang
Rongrong Ji
Rongrong Ji
53
10
0
26 Dec 2022
A Close Look at Spatial Modeling: From Attention to Convolution
Xu Ma
Huan Wang
Can Qin
Kunpeng Li
Xing Zhao
Jie Fu
Yun Fu
ViT
3DPC
25
11
0
23 Dec 2022
DQnet: Cross-Model Detail Querying for Camouflaged Object Detection
Wei Sun
Chengao Liu
Linyan Zhang
Yu Li
Pengxu Wei
Chang-rui Liu
J. Zou
Jianbin Jiao
QiXiang Ye
48
6
0
16 Dec 2022
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
35
159
0
15 Dec 2022
Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
27
10
0
15 Dec 2022
THMA: Tencent HD Map AI System for Creating HD Map Annotations
Kun Tang
Xu Cao
Zhipeng Cao
Tongxi Zhou
Erlong Li
...
Shengtao Zou
Chang-ling Liu
Shuqi Mei
Elena Sizikova
Chao Zheng
17
12
0
14 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
Xinyu Wang
ViT
38
21
0
13 Dec 2022
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Yunhe Wang
Chang Xu
33
9
0
13 Dec 2022
Video Prediction by Efficient Transformers
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
39
33
0
12 Dec 2022
ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient Self-Supervised Monocular Depth Estimation
Daitao Xing
Jinglin Shen
C. Ho
Anthony Tzes
ViT
MDE
31
4
0
12 Dec 2022
Position Embedding Needs an Independent Layer Normalization
Runyi Yu
Zhennan Wang
Yinhuai Wang
Kehan Li
Yian Zhao
Jian Zhang
Guoli Song
Jie Chen
31
1
0
10 Dec 2022
CamoFormer: Masked Separable Attention for Camouflaged Object Detection
Bo Yin
Xuying Zhang
Qibin Hou
Bo Sun
Deng-Ping Fan
Luc Van Gool
28
51
0
10 Dec 2022
Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images
L. Ding
Jing Zhang
Kai Zhang
Haitao Guo
Bing Liu
Lorenzo Bruzzone
23
47
0
10 Dec 2022
ViTPose++: Vision Transformer for Generic Body Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
42
40
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
36
54
0
06 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
26
2
0
06 Dec 2022
Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering
D. M. Nguyen
Hoangvu Nguyen
M. T. N. Truong
T. Cao
Binh Duc Nguyen
Nhat Ho
Paul Swoboda
Shadi Albarqouni
P. Xie
Daniel Sonntag
SSL
24
21
0
04 Dec 2022
Part-based Face Recognition with Vision Transformers
Zhonglin Sun
Georgios Tzimiropoulos
ViT
25
15
0
30 Nov 2022
Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing
Nataniel Ruiz
Sarah Adel Bargal
Cihang Xie
Kate Saenko
Stan Sclaroff
ViT
36
5
0
29 Nov 2022
Lightweight Structure-Aware Attention for Visual Understanding
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
28
2
0
29 Nov 2022
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
Yijiang Liu
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
31
46
0
29 Nov 2022
QuadFormer: Quadruple Transformer for Unsupervised Domain Adaptation in Power Line Segmentation of Aerial Images
P. Rao
Feng Qiao
Weide Zhang
Yiliang Xu
Yong Deng
Guangbin Wu
Qiang Zhang
29
8
0
29 Nov 2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang
William Zhu
Baoxiong Jia
Zan Wang
Xiaojian Ma
Qing Li
Siyuan Huang
37
5
0
28 Nov 2022
Medical Image Segmentation Review: The success of U-Net
Reza Azad
Ehsan Khodapanah Aghdam
Amelie Rauland
Yiwei Jia
Atlas Haddadi Avval
Afshin Bozorgpour
Sanaz Karimijafarbigloo
Joseph Paul Cohen
Ehsan Adeli
Dorit Merhof
SSeg
25
265
0
27 Nov 2022
Semantic-Aware Local-Global Vision Transformer
Jiatong Zhang
Zengwei Yao
Fanglin Chen
Guangming Lu
Wenjie Pei
ViT
25
0
0
27 Nov 2022
CMC v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors
Junlin Hou
Jilan Xu
Nan Zhang
Yi Wang
Yuejie Zhang
Xuanyang Zhang
Rui Feng
29
2
0
26 Nov 2022
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
Zixiang Zhao
Hao Bai
Jiangshe Zhang
Yulun Zhang
Shuang Xu
Zudi Lin
Radu Timofte
Luc Van Gool
37
309
0
26 Nov 2022
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
46
5
0
25 Nov 2022
Spatial Mixture-of-Experts
Nikoli Dryden
Torsten Hoefler
MoE
34
9
0
24 Nov 2022
EurNet: Efficient Multi-Range Relational Modeling of Spatial Multi-Relational Data
Minghao Xu
Yuanfan Guo
Yi Xu
Jiangtao Tang
Xinlei Chen
Yuandong Tian
GNN
13
6
0
23 Nov 2022
Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring
Lingshun Kong
Jiangxin Dong
Mingqiang Li
J. Ge
Jin-shan Pan
ViT
32
142
0
22 Nov 2022
Uncertainty-aware Vision-based Metric Cross-view Geolocalization
F. Fervers
Sebastian Bullinger
C. Bodensteiner
Michael Arens
Rainer Stiefelhagen
26
39
0
22 Nov 2022
Previous
1
2
3
4
5
6
...
11
12
13
Next