ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13797
  4. Cited By
PVT v2: Improved Baselines with Pyramid Vision Transformer

PVT v2: Improved Baselines with Pyramid Vision Transformer

25 June 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
    ViT
    AI4TS
ArXivPDFHTML

Papers citing "PVT v2: Improved Baselines with Pyramid Vision Transformer"

50 / 550 papers shown
Title
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
147
361
0
24 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
QuadTree Attention for Vision Transformers
QuadTree Attention for Vision Transformers
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
163
156
0
08 Jan 2022
PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid
  Architecture
PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture
Kai Han
Jianyuan Guo
Yehui Tang
Yunhe Wang
ViT
26
22
0
04 Jan 2022
Vision Transformer with Deformable Attention
Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
S. Song
Li Erran Li
Gao Huang
ViT
24
456
0
03 Jan 2022
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped
  Attention
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
25
70
0
28 Dec 2021
SimViT: Exploring a Simple Vision Transformer with sliding windows
SimViT: Exploring a Simple Vision Transformer with sliding windows
Gang Li
Di Xu
Xingyi Cheng
Lingyu Si
Changwen Zheng
ViT
26
16
0
24 Dec 2021
Few-Shot Object Detection: A Comprehensive Survey
Few-Shot Object Detection: A Comprehensive Survey
Mona Köhler
M. Eisenbach
H. Groß
ObjD
24
59
0
22 Dec 2021
MIA-Former: Efficient and Robust Vision Transformers via Multi-grained
  Input-Adaptation
MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation
Zhongzhi Yu
Y. Fu
Sicheng Li
Chaojian Li
Yingyan Lin
ViT
31
19
0
21 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
MPViT: Multi-Path Vision Transformer for Dense Prediction
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
29
244
0
21 Dec 2021
Lite Vision Transformer with Enhanced Self-Attention
Lite Vision Transformer with Enhanced Self-Attention
Chenglin Yang
Yilin Wang
Jianming Zhang
He Zhang
Zijun Wei
Zhe-nan Lin
Alan Yuille
ViT
21
112
0
20 Dec 2021
Vision Transformer Based Video Hashing Retrieval for Tracing the Source
  of Fake Videos
Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos
Pengfei Pei
Xianfeng Zhao
Yun Cao
Jinchuan Li
Xiaowei Yi
ViT
22
8
0
15 Dec 2021
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai
Srijan Das
Kumara Kahatapitiya
Michael S. Ryoo
F. Brémond
ViT
39
73
0
07 Dec 2021
GETAM: Gradient-weighted Element-wise Transformer Attention Map for
  Weakly-supervised Semantic segmentation
GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation
Weixuan Sun
Jing Zhang
Zheyuan Liu
Yiran Zhong
Nick Barnes
ViT
60
14
0
06 Dec 2021
Dynamic Token Normalization Improves Vision Transformers
Dynamic Token Normalization Improves Vision Transformers
Wenqi Shao
Yixiao Ge
Zhaoyang Zhang
Xuyuan Xu
Xiaogang Wang
Ying Shan
Ping Luo
ViT
121
11
0
05 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
A. Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
93
2,269
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
48
677
0
02 Dec 2021
Shunted Self-Attention via Multi-Scale Token Aggregation
Shunted Self-Attention via Multi-Scale Token Aggregation
Sucheng Ren
Daquan Zhou
Shengfeng He
Jiashi Feng
Xinchao Wang
ViT
25
222
0
30 Nov 2021
A Unified Pruning Framework for Vision Transformers
A Unified Pruning Framework for Vision Transformers
Hao Yu
Jianxin Wu
ViT
26
59
0
30 Nov 2021
Rethinking Query, Key, and Value Embedding in Vision Transformer under
  Tiny Model Constraints
Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints
Jaesin Ahn
Jiuk Hong
Jeongwoo Ju
Heechul Jung
ViT
24
3
0
19 Nov 2021
Full-attention based Neural Architecture Search using Context
  Auto-regression
Full-attention based Neural Architecture Search using Context Auto-regression
Yuan Zhou
Haiyang Wang
Shuwei Huo
Boyu Wang
25
3
0
13 Nov 2021
A Survey of Visual Transformers
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
71
330
0
11 Nov 2021
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu
Yinghui Li
Li Tao
Dun Liang
Haitao Zheng
85
97
0
07 Nov 2021
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel
  Representation
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
Zhe Chen
Jiahao Wang
Wenhai Wang
Guo Chen
Enze Xie
Ping Luo
Tong Lu
ObjD
17
9
0
03 Nov 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
23
3
0
06 Oct 2021
MISSFormer: An Effective Medical Image Segmentation Transformer
MISSFormer: An Effective Medical Image Segmentation Transformer
Xiaohong Huang
Zhifang Deng
Dandan Li
Xueguang Yuan
ViT
MedIm
87
174
0
15 Sep 2021
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with
  Transformers
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
Zhiqi Li
Wenhai Wang
Enze Xie
Zhiding Yu
Anima Anandkumar
J. Álvarez
Ping Luo
Tong Lu
ViT
34
135
0
08 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
43
105
0
30 Aug 2021
Trans4Trans: Efficient Transformer for Transparent Object and Semantic
  Scene Segmentation in Real-World Navigation Assistance
Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance
Jiaming Zhang
Kailun Yang
Angela Constantinescu
Kunyu Peng
Karin Muller
Rainer Stiefelhagen
ViT
33
69
0
20 Aug 2021
Boosting Salient Object Detection with Transformer-based Asymmetric
  Bilateral U-Net
Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net
Yu Qiu
Yun-Hai Liu
Le Zhang
Jing Xu
ViT
19
30
0
17 Aug 2021
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
B. Dong
Wenhai Wang
Deng-Ping Fan
Jinpeng Li
H. Fu
Ling Shao
ViT
MedIm
24
314
0
16 Aug 2021
S$^2$-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
S2^22-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
39
50
0
02 Aug 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
19
231
0
21 Jul 2021
Locally Enhanced Self-Attention: Combining Self-Attention and
  Convolution as Local and Context Terms
Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms
Chenglin Yang
Siyuan Qiao
Adam Kortylewski
Alan Yuille
22
4
0
12 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Wei Ping
Chaowei Xiao
M. Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViT
VLM
21
131
0
05 Jul 2021
CBNet: A Composite Backbone Network Architecture for Object Detection
CBNet: A Composite Backbone Network Architecture for Object Detection
Tingting Liang
Xiao Chu
Yudong Liu
Yongtao Wang
Zhi Tang
Wei Chu
Jingdong Chen
Haibin Ling
ObjD
13
161
0
01 Jul 2021
K-Net: Towards Unified Image Segmentation
K-Net: Towards Unified Image Segmentation
Wenwei Zhang
Jiangmiao Pang
Kai-xiang Chen
Chen Change Loy
ISeg
32
356
0
28 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
P2T: Pyramid Pooling Transformer for Scene Understanding
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
24
219
0
22 Jun 2021
On the Connection between Local Attention and Dynamic Depth-wise
  Convolution
On the Connection between Local Attention and Dynamic Depth-wise Convolution
Qi Han
Zejia Fan
Qi Dai
Lei-huan Sun
Ming-Ming Cheng
Jiaying Liu
Jingdong Wang
ViT
21
105
0
08 Jun 2021
Vision Transformers with Hierarchical Attention
Vision Transformers with Hierarchical Attention
Yun-Hai Liu
Yu-Huan Wu
Guolei Sun
Le Zhang
Ajad Chhatkuli
Luc Van Gool
ViT
35
32
0
06 Jun 2021
Beyond Self-attention: External Attention using Two Linear Layers for
  Visual Tasks
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
20
473
0
05 May 2021
Visformer: The Vision-friendly Transformer
Visformer: The Vision-friendly Transformer
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
117
209
0
26 Apr 2021
TransCenter: Transformers with Dense Representations for Multiple-Object
  Tracking
TransCenter: Transformers with Dense Representations for Multiple-Object Tracking
Yihong Xu
Yutong Ban
Guillaume Delorme
Chuang Gan
Daniela Rus
Xavier Alameda-Pineda
VOT
25
92
0
28 Mar 2021
Transformer in Transformer
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
284
1,524
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
277
3,622
0
24 Feb 2021
Salient Object Detection via Integrity Learning
Salient Object Detection via Integrity Learning
Mingchen Zhuge
Deng-Ping Fan
Nian Liu
Dingwen Zhang
Dong Xu
Ling Shao
AAML
58
296
0
19 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
227
2,428
0
04 Jan 2021
How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?
Md. Amirul Islam
Sen Jia
Neil D. B. Bruce
SSL
205
344
0
22 Jan 2020
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,561
0
17 Apr 2017
Aggregated Residual Transformations for Deep Neural Networks
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,216
0
16 Nov 2016
Previous
123...10119