ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.12122
  4. Cited By
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

24 February 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
    ViT
ArXivPDFHTML

Papers citing "Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions"

50 / 624 papers shown
Title
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
  Transformers
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
Runsheng Xu
Zhengzhong Tu
Hao Xiang
Wei Shao
Bolei Zhou
Jiaqi Ma
56
218
0
05 Jul 2022
Improving Semantic Segmentation in Transformers using Hierarchical
  Inter-Level Attention
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung
Jun Gao
Fangyin Wei
Sanja Fidler
21
3
0
05 Jul 2022
DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Zhuo Chen
Yufen Huang
Jiaoyan Chen
Yuxia Geng
Wen Zhang
Yin Fang
Jeff Z. Pan
Huajun Chen
VLM
29
65
0
04 Jul 2022
TANet: Transformer-based Asymmetric Network for RGB-D Salient Object
  Detection
TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection
Chang Liu
Gang Yang
Shuo Wang
Hangxu Wang
Yunhua Zhang
Yutao Wang
ViT
37
17
0
04 Jul 2022
Learning Cross-Image Object Semantic Relation in Transformer for
  Few-Shot Fine-Grained Image Classification
Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification
Bo-Wen Zhang
Jiakang Yuan
Baopu Li
Tao Chen
Jiayuan Fan
Botian Shi
ViT
27
31
0
02 Jul 2022
CV 3315 Is All You Need : Semantic Segmentation Competition
CV 3315 Is All You Need : Semantic Segmentation Competition
Akide Liu
Zihan Wang
32
4
0
25 Jun 2022
Toward Clinically Assisted Colorectal Polyp Recognition via Structured
  Cross-modal Representation Consistency
Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency
Weijie Ma
Ye Zhu
Ruimao Zhang
Jie-jin Yang
Yiwen Hu
Zhuguo Li
Lijuan Xiang
ViT
MedIm
16
3
0
23 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
24
25
0
17 Jun 2022
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering
  in Indoor Scenes
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
Rui Zhu
Zhengqin Li
J. Matai
Fatih Porikli
Manmohan Chandraker
ViT
43
46
0
16 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
37
97
0
16 Jun 2022
Patch-level Representation Learning for Self-supervised Vision
  Transformers
Patch-level Representation Learning for Self-supervised Vision Transformers
Sukmin Yun
Hankook Lee
Jaehyung Kim
Jinwoo Shin
ViT
22
64
0
16 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
Chong Li
Biao Wang
Xihan Wei
Lei Zhang
M. Keuper
Xia Hua
ViT
34
15
0
15 Jun 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural
  Networks
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Shanghua Gao
Zhong-Yu Li
Qi Han
Ming-Ming Cheng
Liang Wang
32
34
0
14 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
37
15
0
13 Jun 2022
DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image
  Pyramids for Cancer Prognosis
DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis
Pei Liu
Bo Fu
Feng Ye
Rui Yang
Bin Xu
Luping Ji
13
24
0
12 Jun 2022
NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression
  Recognition
NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression Recognition
Hanting Li
Ming-Fa Sui
Zhaoqing Zhu
Feng Zhao
25
27
0
10 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
26
252
0
06 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
23
347
0
02 Jun 2022
XBound-Former: Toward Cross-scale Boundary Modeling in Transformers
XBound-Former: Toward Cross-scale Boundary Modeling in Transformers
Jiacheng Wang
Fei Chen
Yuxi Ma
Liansheng Wang
Zhaodong Fei
Jia Shuai
Xiangdong Tang
Qichao Zhou
Jing Qin
ViT
MedIm
27
63
0
02 Jun 2022
Vision GNN: An Image is Worth Graph of Nodes
Vision GNN: An Image is Worth Graph of Nodes
Kai Han
Yunhe Wang
Jianyuan Guo
Yehui Tang
Enhua Wu
GNN
3DH
15
352
0
01 Jun 2022
Visual Transformer for Object Detection
Visual Transformer for Object Detection
M. Yang
ViT
25
6
0
01 Jun 2022
Fair Comparison between Efficient Attentions
Fair Comparison between Efficient Attentions
Jiuk Hong
Chaehyeon Lee
Soyoun Bang
Heechul Jung
22
1
0
01 Jun 2022
ViT-BEVSeg: A Hierarchical Transformer Network for Monocular
  Birds-Eye-View Segmentation
ViT-BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation
Pramit Dutta
Ganesh Sistu
S. Yogamani
E. López
J. McDonald
ViT
19
16
0
31 May 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
64
26
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
118
17
0
30 May 2022
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Yangyang Xu
Xiangtai Li
Haobo Yuan
Yibo Yang
Lefei Zhang
ViT
28
45
0
28 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Green Hierarchical Vision Transformer for Masked Image Modeling
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
35
68
0
26 May 2022
Inception Transformer
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
31
187
0
25 May 2022
ASSET: Autoregressive Semantic Scene Editing with Transformers at High
  Resolutions
ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions
Difan Liu
Sandesh Shetty
Tobias Hinz
Matthew Fisher
Richard Y. Zhang
Taesung Park
E. Kalogerakis
ViT
27
30
0
24 May 2022
Super Vision Transformer
Super Vision Transformer
Mingbao Lin
Yonghong Tian
Yuxin Zhang
Yunhang Shen
Rongrong Ji
Liujuan Cao
ViT
46
20
0
23 May 2022
SelfReformer: Self-Refined Network with Transformer for Salient Object
  Detection
SelfReformer: Self-Refined Network with Transformer for Salient Object Detection
Y. Yun
Weisi Lin
ViT
60
28
0
23 May 2022
Knowledge Distillation via the Target-aware Transformer
Knowledge Distillation via the Target-aware Transformer
Sihao Lin
Hongwei Xie
Bing Wang
Kaicheng Yu
Xiaojun Chang
Xiaodan Liang
G. Wang
ViT
20
104
0
22 May 2022
Boosting Camouflaged Object Detection with Dual-Task Interactive
  Transformer
Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer
Zheng Liu
Zhili Zhang
Wei Wu
32
46
0
21 May 2022
Improvements to Self-Supervised Representation Learning for Masked Image
  Modeling
Improvements to Self-Supervised Representation Learning for Masked Image Modeling
Jia-ju Mao
Xuesong Yin
Yuan Chang
Honggu Zhou
SSL
27
1
0
21 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision
  Transformers with Locality
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
116
73
0
20 May 2022
HCFormer: Unified Image Segmentation with Hierarchical Clustering
HCFormer: Unified Image Segmentation with Hierarchical Clustering
Teppei Suzuki
27
0
0
20 May 2022
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual
  Object Detection
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Feng Liu
Xiaosong Zhang
Zhiliang Peng
Zonghao Guo
Fang Wan
Xian-Wei Ji
QiXiang Ye
ObjD
43
20
0
19 May 2022
Cross-Enhancement Transformer for Action Segmentation
Cross-Enhancement Transformer for Action Segmentation
Jiahui Wang
Zhenyou Wang
Shanna Zhuang
Hui Wang
ViT
54
23
0
19 May 2022
Transformer Scale Gate for Semantic Segmentation
Transformer Scale Gate for Semantic Segmentation
Hengcan Shi
Munawar Hayat
Jianfei Cai
ViT
32
22
0
14 May 2022
Vision Transformer: Vit and its Derivatives
Vision Transformer: Vit and its Derivatives
Zujun Fu
ViT
41
6
0
12 May 2022
AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
Xu Cao
Xiaoye Li
Liya Ma
Yi Huang
X. Feng
Zening Chen
H. Zeng
Jianguo Cao
ViT
13
21
0
11 May 2022
Activating More Pixels in Image Super-Resolution Transformer
Activating More Pixels in Image Super-Resolution Transformer
Xiangyu Chen
Xintao Wang
Jiantao Zhou
Yu Qiao
Chao Dong
ViT
79
602
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision
  Transformers
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
31
181
0
06 May 2022
Sequencer: Deep LSTM for Image Classification
Sequencer: Deep LSTM for Image Classification
Yuki Tatsunami
Masato Taki
VLM
ViT
16
78
0
04 May 2022
Cross Domain Object Detection by Target-Perceived Dual Branch
  Distillation
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation
Meng He
Yali Wang
Jiaxi Wu
Yiru Wang
Hanqing Li
Bo-wen Li
Weihao Gan
Wei Wu
Yu Qiao
31
69
0
03 May 2022
Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel
  Transformer
Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer
Wu Yun
Mengshi Qi
Chuanming Wang
Huiyuan Fu
Huadong Ma
ViT
13
6
0
30 Apr 2022
Depth Estimation with Simplified Transformer
Depth Estimation with Simplified Transformer
John Yang
Le An
Anurag Dixit
Jinkyu Koo
Su Inn Park
MDE
33
21
0
28 Apr 2022
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
28
514
0
26 Apr 2022
Previous
123...789...111213
Next