ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13797
  4. Cited By
PVT v2: Improved Baselines with Pyramid Vision Transformer

PVT v2: Improved Baselines with Pyramid Vision Transformer

25 June 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
    ViT
    AI4TS
ArXivPDFHTML

Papers citing "PVT v2: Improved Baselines with Pyramid Vision Transformer"

50 / 551 papers shown
Title
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional
  Models
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models
Phuoc-Hoan Charles Le
Xinlin Li
ViT
MQ
25
21
0
29 Jun 2023
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text
  Removal with SegMIM Pretraining
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining
Dezhi Peng
Chongyu Liu
Yuliang Liu
Lianwen Jin
DiffM
19
9
0
21 Jun 2023
SegT: A Novel Separated Edge-guidance Transformer Network for Polyp
  Segmentation
SegT: A Novel Separated Edge-guidance Transformer Network for Polyp Segmentation
Feiyu Chen
Haiping Ma
Weijia Zhang
ViT
MedIm
33
7
0
19 Jun 2023
Revisiting Token Pruning for Object Detection and Instance Segmentation
Revisiting Token Pruning for Object Detection and Instance Segmentation
Yifei Liu
Mathias Gehrig
Nico Messikommer
Marco Cannici
Davide Scaramuzza
ViT
VLM
37
24
0
12 Jun 2023
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient
  Vision Transformer
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Haoran You
Huihong Shi
Yipin Guo
Yingyan Lin
Lin
31
16
0
10 Jun 2023
FasterViT: Fast Vision Transformers with Hierarchical Attention
FasterViT: Fast Vision Transformers with Hierarchical Attention
Ali Hatamizadeh
Greg Heinrich
Hongxu Yin
Andrew Tao
J. Álvarez
Jan Kautz
Pavlo Molchanov
ViT
20
67
0
09 Jun 2023
Lightweight Structure-aware Transformer Network for VHR Remote Sensing
  Image Change Detection
Lightweight Structure-aware Transformer Network for VHR Remote Sensing Image Change Detection
Tao Lei
Yetong Xu
Hailong Ning
Z. Lv
Chongdan Min
Yaochu Jin
A. Nandi
ViT
6
6
0
03 Jun 2023
Collect-and-Distribute Transformer for 3D Point Cloud Analysis
Collect-and-Distribute Transformer for 3D Point Cloud Analysis
Haibo Qiu
Baosheng Yu
Dacheng Tao
3DPC
ViT
27
6
0
02 Jun 2023
Lightweight Vision Transformer with Bidirectional Interaction
Lightweight Vision Transformer with Bidirectional Interaction
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Ran He
ViT
50
28
0
01 Jun 2023
GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray
  Classification
GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification
Bin Wang
Hongyi Pan
Armstrong Aboah
Zheyu Zhang
Elif Keles
Drew A Torigian
B. Turkbey
Elizabeth A. Krupinski
J. Udupa
Ulas Bagci
34
21
0
29 May 2023
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
  Models
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models
Zhongxi Chen
Ke Sun
Xianming Lin
Rongrong Ji
DiffM
30
27
0
29 May 2023
Dual Path Transformer with Partition Attention
Dual Path Transformer with Partition Attention
Zhengkai Jiang
Liang Liu
Jiangning Zhang
Yabiao Wang
Mingang Chen
Chengjie Wang
ViT
34
2
0
24 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
103
77
0
22 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for
  Vision-Centric Tasks
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
35
457
0
18 May 2023
Annotation-free Audio-Visual Segmentation
Annotation-free Audio-Visual Segmentation
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya-Qin Zhang
Weidi Xie
VOS
VLM
36
28
0
18 May 2023
Content-based Unrestricted Adversarial Attack
Content-based Unrestricted Adversarial Attack
Zhaoyu Chen
Bo-wen Li
Shuang Wu
Kaixun Jiang
Shouhong Ding
Wenqiang Zhang
DiffM
29
61
0
18 May 2023
Object Segmentation by Mining Cross-Modal Semantics
Object Segmentation by Mining Cross-Modal Semantics
Zongwei Wu
Jingjing Wang
Zhuyun Zhou
Zhaochong An
Qiuping Jiang
C. Demonceaux
Guolei Sun
Radu Timofte
30
17
0
17 May 2023
CageViT: Convolutional Activation Guided Efficient Vision Transformer
CageViT: Convolutional Activation Guided Efficient Vision Transformer
Hao Zheng
Jinbao Wang
Xiantong Zhen
H. Chen
Jingkuan Song
Feng Zheng
ViT
20
0
0
17 May 2023
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group
  Attention
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Xinyu Liu
Houwen Peng
Ningxin Zheng
Yuqing Yang
Han Hu
Yixuan Yuan
ViT
25
276
0
11 May 2023
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT
  Beyond Language
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Zhaoyang Liu
Yinan He
Wenhai Wang
Weiyun Wang
Yi Wang
...
Yali Wang
Limin Wang
Ping Luo
Jifeng Dai
Yu Qiao
LRM
MLLM
24
79
0
09 May 2023
Semantic Segmentation using Vision Transformers: A survey
Semantic Segmentation using Vision Transformers: A survey
Hans Thisanke
Chamli Deshan
K. Chamith
Sachith Seneviratne
Rajith Vidanaarachchi
Damayanthi Herath
ViT
37
146
0
05 May 2023
OctFormer: Octree-based Transformers for 3D Point Clouds
OctFormer: Octree-based Transformers for 3D Point Clouds
Peng-Shuai Wang
ViT
3DPC
32
81
0
04 May 2023
Revisiting the Encoding of Satellite Image Time Series
Revisiting the Encoding of Satellite Image Time Series
Xin Cai
Y. Bi
Peter Nicholl
Roy Sterritt
AI4TS
35
3
0
03 May 2023
Faster OreFSDet : A Lightweight and Effective Few-shot Object Detector
  for Ore Images
Faster OreFSDet : A Lightweight and Effective Few-shot Object Detector for Ore Images
Yang Zhang
Lei Cheng
Yuting Peng
C. Xu
Yanwei Fu
Bo Wu
Guodong Sun
ObjD
76
6
0
02 May 2023
UniNeXt: Exploring A Unified Architecture for Vision Recognition
UniNeXt: Exploring A Unified Architecture for Vision Recognition
Fangjian Lin
Jianlong Yuan
Sitong Wu
Fan Wang
Zhibin Wang
ViT
26
14
0
26 Apr 2023
AutoFocusFormer: Image Segmentation off the Grid
AutoFocusFormer: Image Segmentation off the Grid
Chen Ziwen
K. Patnaik
Shuangfei Zhai
Alvin Wan
Zhile Ren
A. Schwing
Alex Colburn
Li Fuxin
22
9
0
24 Apr 2023
Advances in Deep Concealed Scene Understanding
Advances in Deep Concealed Scene Understanding
Deng-Ping Fan
Ge-Peng Ji
Peng-Tao Xu
Ming-Ming Cheng
Christos Sakaridis
Luc Van Gool
35
69
0
21 Apr 2023
DCN-T: Dual Context Network with Transformer for Hyperspectral Image
  Classification
DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification
Di Wang
Jing Zhang
Bo Du
L. Zhang
Dacheng Tao
21
50
0
19 Apr 2023
EGformer: Equirectangular Geometry-biased Transformer for 360 Depth
  Estimation
EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation
Ilwi Yun
Chanyong Shin
Hyunku Lee
Hyuk-Jae Lee
Chae-Eun Rhee
ViT
MDE
29
17
0
16 Apr 2023
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple
  Parameter-Efficient Fine-Tuning
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Enze Xie
Lewei Yao
Han Shi
Zhili Liu
Daquan Zhou
Zhaoqiang Liu
Jiawei Li
Zhenguo Li
28
76
0
13 Apr 2023
SpectFormer: Frequency and Attention is what you need in a Vision
  Transformer
SpectFormer: Frequency and Attention is what you need in a Vision Transformer
Badri N. Patro
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
ViT
35
47
0
13 Apr 2023
Why Existing Multimodal Crowd Counting Datasets Can Lead to Unfulfilled
  Expectations in Real-World Applications
Why Existing Multimodal Crowd Counting Datasets Can Lead to Unfulfilled Expectations in Real-World Applications
M. Thissen
Elke Hergenröther
25
1
0
13 Apr 2023
Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention
  and Residual Connection in Kernel Space
Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space
Seokju Yun
Youngmin Ro
ViT
24
2
0
13 Apr 2023
SAM Struggles in Concealed Scenes -- Empirical Study on "Segment
  Anything"
SAM Struggles in Concealed Scenes -- Empirical Study on "Segment Anything"
Ge-Peng Ji
Deng-Ping Fan
Peng-Tao Xu
Ming-Ming Cheng
Bowen Zhou
Luc Van Gool
26
96
0
12 Apr 2023
Slide-Transformer: Hierarchical Vision Transformer with Local
  Self-Attention
Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Xuran Pan
Tianzhu Ye
Zhuofan Xia
S. Song
Gao Huang
ViT
33
53
0
09 Apr 2023
UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong
  Representation Learner
UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner
Yiwen Ye
Yutong Xie
Jianpeng Zhang
Ziyang Chen
Yong-quan Xia
SSL
33
40
0
07 Apr 2023
Visual Dependency Transformers: Dependency Tree Emerges from Reversed
  Attention
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
Mingyu Ding
Yikang Shen
Lijie Fan
Zhenfang Chen
Z. Chen
Ping Luo
J. Tenenbaum
Chuang Gan
ViT
84
14
0
06 Apr 2023
Unraveling Instance Associations: A Closer Look for Audio-Visual
  Segmentation
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen
Yuyuan Liu
Hu Wang
Fengbei Liu
Chong Wang
Helen Frazer
G. Carneiro
VOS
27
15
0
06 Apr 2023
Hierarchical Vision Transformers for Cardiac Ejection Fraction
  Estimation
Hierarchical Vision Transformers for Cardiac Ejection Fraction Estimation
Lhuqita Fazry
Asep Haryono
Nuzulul Khairu Nissa
Sunarno
Naufal Muhammad Hirzi
M. F. Rachmadi
W. Jatmiko
MedIm
16
16
0
31 Mar 2023
Rethinking Local Perception in Lightweight Vision Transformer
Rethinking Local Perception in Lightweight Vision Transformer
Qi Fan
Huaibo Huang
Jiyang Guan
Ran He
ViT
28
30
0
31 Mar 2023
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution
  Vision Transformer
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Xuanyao Chen
Zhijian Liu
Haotian Tang
Li Yi
Hang Zhao
Song Han
ViT
26
46
0
30 Mar 2023
Multi-scale Hierarchical Vision Transformer with Cascaded Attention
  Decoding for Medical Image Segmentation
Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman
R. Marculescu
MedIm
ViT
24
44
0
29 Mar 2023
InceptionNeXt: When Inception Meets ConvNeXt
InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu
Pan Zhou
Shuicheng Yan
Xinchao Wang
48
119
0
29 Mar 2023
Multi-modal learning for geospatial vegetation forecasting
Multi-modal learning for geospatial vegetation forecasting
V. Benson
Claire Robin
C. Requena-Mesa
Lazaro Alonso
Nuno Carvalhais
José A. Cortés
Zhihan Gao
Nora Linscheid
M. Weynants
Markus Reichstein
30
11
0
28 Mar 2023
SwiftFormer: Efficient Additive Attention for Transformer-based
  Real-time Mobile Vision Applications
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Abdelrahman M. Shaker
Muhammad Maaz
H. Rasheed
Salman Khan
Ming Yang
F. Khan
ViT
50
84
0
27 Mar 2023
Vision Transformer with Quadrangle Attention
Vision Transformer with Quadrangle Attention
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
24
38
0
27 Mar 2023
Supervised Masked Knowledge Distillation for Few-Shot Transformers
Supervised Masked Knowledge Distillation for Few-Shot Transformers
Hanxi Lin
G. Han
Jiawei Ma
Shiyuan Huang
Xudong Lin
Shih-Fu Chang
24
35
0
25 Mar 2023
Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR
Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR
Aneeshan Sain
A. Bhunia
Subhadeep Koley
Pinaki Nath Chowdhury
Soumitri Chattopadhyay
Tao Xiang
Yi-Zhe Song
28
18
0
24 Mar 2023
Spherical Transformer for LiDAR-based 3D Recognition
Spherical Transformer for LiDAR-based 3D Recognition
Xin Lai
Yukang Chen
Fanbin Lu
Jianhui Liu
Jiaya Jia
3DPC
37
126
0
22 Mar 2023
LEAPS: End-to-End One-Step Person Search With Learnable Proposals
LEAPS: End-to-End One-Step Person Search With Learnable Proposals
Zhiqiang Dong
Jiale Cao
Rao Muhammad Anwer
J. Xie
Fahad Khan
Yanwei Pang
26
1
0
21 Mar 2023
Previous
123...678...101112
Next