Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.12122
Cited By
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
24 February 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions"
50 / 624 papers shown
Title
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
Runsheng Xu
Zhengzhong Tu
Hao Xiang
Wei Shao
Bolei Zhou
Jiaqi Ma
56
218
0
05 Jul 2022
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung
Jun Gao
Fangyin Wei
Sanja Fidler
21
3
0
05 Jul 2022
DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Zhuo Chen
Yufen Huang
Jiaoyan Chen
Yuxia Geng
Wen Zhang
Yin Fang
Jeff Z. Pan
Huajun Chen
VLM
29
65
0
04 Jul 2022
TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection
Chang Liu
Gang Yang
Shuo Wang
Hangxu Wang
Yunhua Zhang
Yutao Wang
ViT
37
17
0
04 Jul 2022
Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification
Bo-Wen Zhang
Jiakang Yuan
Baopu Li
Tao Chen
Jiayuan Fan
Botian Shi
ViT
27
31
0
02 Jul 2022
CV 3315 Is All You Need : Semantic Segmentation Competition
Akide Liu
Zihan Wang
32
4
0
25 Jun 2022
Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency
Weijie Ma
Ye Zhu
Ruimao Zhang
Jie-jin Yang
Yiwen Hu
Zhuguo Li
Lijuan Xiang
ViT
MedIm
16
3
0
23 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
24
25
0
17 Jun 2022
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
Rui Zhu
Zhengqin Li
J. Matai
Fatih Porikli
Manmohan Chandraker
ViT
43
46
0
16 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
37
97
0
16 Jun 2022
Patch-level Representation Learning for Self-supervised Vision Transformers
Sukmin Yun
Hankook Lee
Jaehyung Kim
Jinwoo Shin
ViT
22
64
0
16 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
Chong Li
Biao Wang
Xihan Wei
Lei Zhang
M. Keuper
Xia Hua
ViT
34
15
0
15 Jun 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Shanghua Gao
Zhong-Yu Li
Qi Han
Ming-Ming Cheng
Liang Wang
32
34
0
14 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
37
15
0
13 Jun 2022
DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis
Pei Liu
Bo Fu
Feng Ye
Rui Yang
Bin Xu
Luping Ji
13
24
0
12 Jun 2022
NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression Recognition
Hanting Li
Ming-Fa Sui
Zhaoqing Zhu
Feng Zhao
25
27
0
10 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
26
252
0
06 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
23
347
0
02 Jun 2022
XBound-Former: Toward Cross-scale Boundary Modeling in Transformers
Jiacheng Wang
Fei Chen
Yuxi Ma
Liansheng Wang
Zhaodong Fei
Jia Shuai
Xiangdong Tang
Qichao Zhou
Jing Qin
ViT
MedIm
27
63
0
02 Jun 2022
Vision GNN: An Image is Worth Graph of Nodes
Kai Han
Yunhe Wang
Jianyuan Guo
Yehui Tang
Enhua Wu
GNN
3DH
15
352
0
01 Jun 2022
Visual Transformer for Object Detection
M. Yang
ViT
25
6
0
01 Jun 2022
Fair Comparison between Efficient Attentions
Jiuk Hong
Chaehyeon Lee
Soyoun Bang
Heechul Jung
22
1
0
01 Jun 2022
ViT-BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation
Pramit Dutta
Ganesh Sistu
S. Yogamani
E. López
J. McDonald
ViT
19
16
0
31 May 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
64
26
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
118
17
0
30 May 2022
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Yangyang Xu
Xiangtai Li
Haobo Yuan
Yibo Yang
Lefei Zhang
ViT
28
45
0
28 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
35
68
0
26 May 2022
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
31
187
0
25 May 2022
ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions
Difan Liu
Sandesh Shetty
Tobias Hinz
Matthew Fisher
Richard Y. Zhang
Taesung Park
E. Kalogerakis
ViT
27
30
0
24 May 2022
Super Vision Transformer
Mingbao Lin
Yonghong Tian
Yuxin Zhang
Yunhang Shen
Rongrong Ji
Liujuan Cao
ViT
46
20
0
23 May 2022
SelfReformer: Self-Refined Network with Transformer for Salient Object Detection
Y. Yun
Weisi Lin
ViT
60
28
0
23 May 2022
Knowledge Distillation via the Target-aware Transformer
Sihao Lin
Hongwei Xie
Bing Wang
Kaicheng Yu
Xiaojun Chang
Xiaodan Liang
G. Wang
ViT
20
104
0
22 May 2022
Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer
Zheng Liu
Zhili Zhang
Wei Wu
32
46
0
21 May 2022
Improvements to Self-Supervised Representation Learning for Masked Image Modeling
Jia-ju Mao
Xuesong Yin
Yuan Chang
Honggu Zhou
SSL
27
1
0
21 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
116
73
0
20 May 2022
HCFormer: Unified Image Segmentation with Hierarchical Clustering
Teppei Suzuki
27
0
0
20 May 2022
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Feng Liu
Xiaosong Zhang
Zhiliang Peng
Zonghao Guo
Fang Wan
Xian-Wei Ji
QiXiang Ye
ObjD
43
20
0
19 May 2022
Cross-Enhancement Transformer for Action Segmentation
Jiahui Wang
Zhenyou Wang
Shanna Zhuang
Hui Wang
ViT
54
23
0
19 May 2022
Transformer Scale Gate for Semantic Segmentation
Hengcan Shi
Munawar Hayat
Jianfei Cai
ViT
32
22
0
14 May 2022
Vision Transformer: Vit and its Derivatives
Zujun Fu
ViT
41
6
0
12 May 2022
AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
Xu Cao
Xiaoye Li
Liya Ma
Yi Huang
X. Feng
Zening Chen
H. Zeng
Jianguo Cao
ViT
13
21
0
11 May 2022
Activating More Pixels in Image Super-Resolution Transformer
Xiangyu Chen
Xintao Wang
Jiantao Zhou
Yu Qiao
Chao Dong
ViT
79
602
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
31
181
0
06 May 2022
Sequencer: Deep LSTM for Image Classification
Yuki Tatsunami
Masato Taki
VLM
ViT
16
78
0
04 May 2022
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation
Meng He
Yali Wang
Jiaxi Wu
Yiru Wang
Hanqing Li
Bo-wen Li
Weihao Gan
Wei Wu
Yu Qiao
31
69
0
03 May 2022
Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer
Wu Yun
Mengshi Qi
Chuanming Wang
Huiyuan Fu
Huadong Ma
ViT
13
6
0
30 Apr 2022
Depth Estimation with Simplified Transformer
John Yang
Le An
Anurag Dixit
Jinkyu Koo
Su Inn Park
MDE
33
21
0
28 Apr 2022
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
28
514
0
26 Apr 2022
Previous
1
2
3
...
7
8
9
...
11
12
13
Next