ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13797
  4. Cited By
PVT v2: Improved Baselines with Pyramid Vision Transformer

PVT v2: Improved Baselines with Pyramid Vision Transformer

25 June 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
    ViT
    AI4TS
ArXivPDFHTML

Papers citing "PVT v2: Improved Baselines with Pyramid Vision Transformer"

50 / 551 papers shown
Title
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual
  Question Answering for Autonomous Driving
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
Peiru Zheng
Yun Zhao
Zhan Gong
Hong Zhu
Shaohua Wu
MLLM
35
7
0
31 Jul 2024
Floating No More: Object-Ground Reconstruction from a Single Image
Floating No More: Object-Ground Reconstruction from a Single Image
Yunze Man
Yichen Sheng
Jianming Zhang
Liangyan Gui
Yu-xiong Wang
34
2
0
26 Jul 2024
VSSD: Vision Mamba with Non-Causal State Space Duality
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi
Minjing Dong
Mingjia Li
Chang Xu
Mamba
33
5
0
26 Jul 2024
Embedding-Free Transformer with Inference Spatial Reduction for
  Efficient Semantic Segmentation
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu
Yubin Cho
Beoungwoo Kang
Seunghun Moon
Kyeongbo Kong
Suk-Ju Kang
30
3
0
24 Jul 2024
MxT: Mamba x Transformer for Image Inpainting
MxT: Mamba x Transformer for Image Inpainting
Shuang Chen
Amir Atapour-Abarghouei
Haozheng Zhang
Hubert P. H. Shum
Mamba
40
2
0
23 Jul 2024
SegPoint: Segment Any Point Cloud via Large Language Model
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He
Henghui Ding
Xudong Jiang
Bihan Wen
3DV
MLLM
3DPC
48
18
0
18 Jul 2024
Learning Camouflaged Object Detection from Noisy Pseudo Label
Learning Camouflaged Object Detection from Noisy Pseudo Label
Jin Zhang
Ruiheng Zhang
Yanjiao Shi
Zhe Cao
Nian Liu
Fahad Shahbaz Khan
29
5
0
18 Jul 2024
FocusDiffuser: Perceiving Local Disparities for Camouflaged Object
  Detection
FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
Jian-wen Zhao
Xin Li
Fan Yang
Qiang Zhai
Ao Luo
Zicheng Jiao
Hong Cheng
DiffM
42
7
0
18 Jul 2024
GroupMamba: Efficient Group-Based Visual State Space Model
GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman M. Shaker
Syed Talal Wasim
Salman Khan
Juergen Gall
Fahad Shahbaz Khan
Mamba
56
0
0
18 Jul 2024
OAM-TCD: A globally diverse dataset of high-resolution tree cover maps
OAM-TCD: A globally diverse dataset of high-resolution tree cover maps
Josh Veitch-Michaelis
Andrew Cottam
Daniella Schweizer
Eben N. Broadbent
David Dao
Ce Zhang
Angélica María Almeyda Zambrano
Simeon Max
35
1
0
16 Jul 2024
Centering the Value of Every Modality: Towards Efficient and Resilient
  Modality-agnostic Semantic Segmentation
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xueye Zheng
Yuanhuiyi Lyu
Jiazhou Zhou
Lin Wang
27
8
0
16 Jul 2024
TCFormer: Visual Recognition via Token Clustering Transformer
TCFormer: Visual Recognition via Token Clustering Transformer
Wang Zeng
Sheng Jin
Lumin Xu
Wentao Liu
Chao Qian
Wanli Ouyang
Ping Luo
Xiaogang Wang
33
3
0
16 Jul 2024
Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with
  Multi-Sensor Fusion
Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion
Shiqi Tan
H. Fazlali
Yixuan Xu
Y. Ren
Bingbing Liu
3DPC
30
1
0
12 Jul 2024
H-FCBFormer Hierarchical Fully Convolutional Branch Transformer for
  Occlusal Contact Segmentation with Articulating Paper
H-FCBFormer Hierarchical Fully Convolutional Branch Transformer for Occlusal Contact Segmentation with Articulating Paper
Ryan Banks
B. Rovira-Lastra
Jordi Martinez-Gomis
A. Chaurasia
Yunpeng Li
MedIm
38
0
0
10 Jul 2024
HAFormer: Unleashing the Power of Hierarchy-Aware Features for
  Lightweight Semantic Segmentation
HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation
Guoan Xu
Wenjing Jia
Tao Wu
Ligeng Chen
Guangwei Gao
ViT
38
9
0
10 Jul 2024
Dual-stage Hyperspectral Image Classification Model with Spectral
  Supertoken
Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken
Peifu Liu
Tingfa Xu
Jie Wang
Huan Chen
Huiyan Bai
Jianan Li
30
3
0
10 Jul 2024
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab
M. Maruf
Arka Daw
Harish Babu Manogaran
Abhilash Neog
...
Paula Mabee
Wasila Dahdul
Anuj Karpatne
Wasila M Dahdul
Anuj Karpatne
41
4
0
10 Jul 2024
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
Omar S. El-Assiouti
Ghada Hamed
Dina Khattab
H. M. Ebied
39
1
0
10 Jul 2024
CTRL-F: Pairing Convolution with Transformer for Image Classification
  via Multi-Level Feature Cross-Attention and Representation Learning Fusion
CTRL-F: Pairing Convolution with Transformer for Image Classification via Multi-Level Feature Cross-Attention and Representation Learning Fusion
Hosam S. El-Assiouti
Hadeer El-Saadawy
M. Al-Berry
M. Tolba
ViT
52
0
0
09 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
40
2
0
07 Jul 2024
Isomorphic Pruning for Vision Models
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLM
ViT
34
6
0
05 Jul 2024
VFIMamba: Video Frame Interpolation with State Space Models
VFIMamba: Video Frame Interpolation with State Space Models
Guozhen Zhang
Chunxu Liu
Yutao Cui
Xiaotong Zhao
Kai Ma
Limin Wang
45
8
0
02 Jul 2024
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D
  Images and 3D Scenes
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
Qi Ma
Danda Pani Paudel
E. Konukoglu
Luc Van Gool
38
6
0
25 Jun 2024
CriDiff: Criss-cross Injection Diffusion Framework via Generative
  Pre-train for Prostate Segmentation
CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation
Tingwei Liu
Miao Zhang
Leiye Liu
Jialong Zhong
Shuyao Wang
Yongri Piao
Huchuan Lu
MedIm
DiffM
27
2
0
20 Jun 2024
SALI: Short-term Alignment and Long-term Interaction Network for
  Colonoscopy Video Polyp Segmentation
SALI: Short-term Alignment and Long-term Interaction Network for Colonoscopy Video Polyp Segmentation
Qiang Hu
Zhenyu Yi
Ying Zhou
Fang Peng
Mei Liu
Qiang Li
Zhiwei Wang
27
2
0
19 Jun 2024
SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for
  Efficient Large-scale Point Cloud Semantic Segmentation
SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation
Zhenchao Lin
Li He
Hongqiang Yang
Xiaoqun Sun
Cuojin Zhang
Weinan Chen
Yisheng Guan
Hong Zhang
3DPC
29
0
0
17 Jun 2024
ProMotion: Prototypes As Motion Learners
ProMotion: Prototypes As Motion Learners
Yawen Lu
Dongfang Liu
Qifan Wang
Cheng Han
Yiming Cui
Zhiwen Cao
Xueling Zhang
Yingjie Victor Chen
Heng Fan
DiffM
40
2
0
07 Jun 2024
Learning 1D Causal Visual Representation with De-focus Attention
  Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Chenxin Tao
Xizhou Zhu
Shiqian Su
Lewei Lu
Changyao Tian
...
Gao Huang
Hongsheng Li
Yu Qiao
Jie Zhou
Jifeng Dai
70
1
0
06 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
48
6
0
06 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
53
0
0
04 Jun 2024
You Only Need Less Attention at Each Stage in Vision Transformers
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang
Hanpeng Liu
Stephen Lin
Kun He
53
5
0
01 Jun 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
54
4
0
28 May 2024
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any
  Resolution
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Wenzhuo Liu
Fei Zhu
Shijie Ma
Cheng-Lin Liu
30
4
0
28 May 2024
Demystify Mamba in Vision: A Linear Attention Perspective
Demystify Mamba in Vision: A Linear Attention Perspective
Dongchen Han
Ziyi Wang
Zhuofan Xia
Yizeng Han
Yifan Pu
Chunjiang Ge
Jun Song
Shiji Song
Bo Zheng
Gao Huang
Mamba
34
49
0
26 May 2024
Building Vision Models upon Heat Conduction
Building Vision Models upon Heat Conduction
Zhaozhi Wang
Yue Liu
Yunfan Liu
Hongtian Yu
Yaowei Wang
QiXiang Ye
ViT
VLM
55
0
0
26 May 2024
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
Yuheng Shi
Minjing Dong
Chang Xu
Mamba
48
32
0
23 May 2024
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for
  Vision Transformer
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
51
0
0
22 May 2024
Vision Transformer with Sparse Scan Prior
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
48
5
0
22 May 2024
Influence of Water Droplet Contamination for Transparency Segmentation
Influence of Water Droplet Contamination for Transparency Segmentation
Volker Knauthe
Paul Weitz
Thomas Pollabauer
Tristan Wirth
Arne Rak
Arjan Kuijper
Dieter W. Fellner
43
1
0
21 May 2024
Multi-View Attentive Contextualization for Multi-View 3D Object
  Detection
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu
Ce Zheng
Ming Qian
Nan Xue
Cheng Chen
Zhebin Zhang
Chen Li
Tianfu Wu
38
2
0
20 May 2024
Filling Missing Values Matters for Range Image-Based Point Cloud
  Segmentation
Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation
Bike Chen
Chen Gong
Juha Röning
3DPC
53
3
0
16 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for
  Remote Sensing Image Interpretation
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
37
5
0
16 May 2024
MambaOut: Do We Really Need Mamba for Vision?
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu
Xinchao Wang
Mamba
50
48
0
13 May 2024
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for
  Medical Image Segmentation
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman
Mustafa Munir
R. Marculescu
MedIm
32
33
0
11 May 2024
Vision Mamba: A Comprehensive Survey and Taxonomy
Vision Mamba: A Comprehensive Survey and Taxonomy
Xiao Liu
Chenxu Zhang
Lei Zhang
Mamba
41
26
0
07 May 2024
Spider: A Unified Framework for Context-dependent Concept Segmentation
Spider: A Unified Framework for Context-dependent Concept Segmentation
Xiaoqi Zhao
Youwei Pang
Wei Ji
Baicheng Sheng
Jiaming Zuo
Lihe Zhang
Huchuan Lu
39
6
0
02 May 2024
RGB$\leftrightarrow$X: Image decomposition and synthesis using material-
  and lighting-aware diffusion models
RGB↔\leftrightarrow↔X: Image decomposition and synthesis using material- and lighting-aware diffusion models
Zheng Zeng
Valentin Deschaintre
Iliyan Georgiev
Yannick Hold-Geoffroy
Yiwei Hu
Fujun Luan
Ling-Qi Yan
Miloš Hašan
DiffM
42
36
0
01 May 2024
Visual Mamba: A Survey and New Outlooks
Visual Mamba: A Survey and New Outlooks
Rui Xu
Shu Yang
Yihui Wang
Yu Cai
Bo Du
Hao Chen
Mamba
42
26
0
29 Apr 2024
Rethinking Attention Gated with Hybrid Dual Pyramid Transformer-CNN for
  Generalized Segmentation in Medical Imaging
Rethinking Attention Gated with Hybrid Dual Pyramid Transformer-CNN for Generalized Segmentation in Medical Imaging
F. Bougourzi
Fadi Dornaika
Abdelmalik Taleb-Ahmed
Vinh Truong Hoang
MedIm
ViT
42
2
0
28 Apr 2024
Other Tokens Matter: Exploring Global and Local Features of Vision
  Transformers for Object Re-Identification
Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification
Yingquan Wang
Pingping Zhang
Dong Wang
Huchuan Lu
ViT
42
7
0
23 Apr 2024
Previous
123456...101112
Next