ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13797
  4. Cited By
PVT v2: Improved Baselines with Pyramid Vision Transformer

PVT v2: Improved Baselines with Pyramid Vision Transformer

25 June 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
    ViT
    AI4TS
ArXivPDFHTML

Papers citing "PVT v2: Improved Baselines with Pyramid Vision Transformer"

50 / 550 papers shown
Title
A Simple Detector with Frame Dynamics is a Strong Tracker
A Simple Detector with Frame Dynamics is a Strong Tracker
Chenxu Peng
C. Wang
Minrui Zou
Danyang Li
Z. Yang
Yimian Dai
Ming-Ming Cheng
Xiang Li
44
0
0
08 May 2025
Building-Guided Pseudo-Label Learning for Cross-Modal Building Damage Mapping
Building-Guided Pseudo-Label Learning for Cross-Modal Building Damage Mapping
Jiepan Li
He Huang
Yu Sheng
Y. Guo
Wei He
46
0
0
08 May 2025
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
Sainath Dey
Mitul Goswami
Jashika Sethi
Prasant Kumar Pattnaik
ViT
30
0
0
07 May 2025
Fine-grained spatial-temporal perception for gas leak segmentation
Fine-grained spatial-temporal perception for gas leak segmentation
Xinlong Zhao
Shan Du
39
0
0
01 May 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
52
1
0
28 Apr 2025
SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation
SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation
Yulong Guo
Zilun Zhang
Yongheng Shang
Tiancheng Zhao
Shuiguang Deng
Yingchun Yang
Jianwei Yin
68
0
0
28 Apr 2025
Frequency-Compensated Network for Daily Arctic Sea Ice Concentration Prediction
Frequency-Compensated Network for Daily Arctic Sea Ice Concentration Prediction
Jialiang Zhang
Feng Gao
Yanhai Gan
Junyu Dong
Q. Du
26
0
0
23 Apr 2025
PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling
PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling
Alara Dirik
Tuanfeng Y. Wang
Duygu Ceylan
Stefanos Zafeiriou
Anna Frühstück
DiffM
47
0
0
19 Apr 2025
Visual Consensus Prompting for Co-Salient Object Detection
Visual Consensus Prompting for Co-Salient Object Detection
J. T. Wang
Nana Yu
Zihao Zhang
Yahong Han
24
0
0
19 Apr 2025
FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention
FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention
Jun Zeng
KC Santosh
Deepak Rajan Nayak
Thomas de Lange
Jonas Varkey
Tyler Berzin
Debesh Jha
ViT
MedIm
38
0
0
18 Apr 2025
SC3EF: A Joint Self-Correlation and Cross-Correspondence Estimation Framework for Visible and Thermal Image Registration
SC3EF: A Joint Self-Correlation and Cross-Correspondence Estimation Framework for Visible and Thermal Image Registration
Xi Tong
Xing Luo
Jiangxin Yang
Yanpeng Cao
31
0
0
17 Apr 2025
LightFormer: A lightweight and efficient decoder for remote sensing image segmentation
LightFormer: A lightweight and efficient decoder for remote sensing image segmentation
Sihang Chen
Lijun Yun
Z. Liu
JianFeng Zhu
J. Chen
Hui Wang
Yueping Nie
26
0
0
15 Apr 2025
PraNet-V2: Dual-Supervised Reverse Attention for Medical Image Segmentation
PraNet-V2: Dual-Supervised Reverse Attention for Medical Image Segmentation
Bo-Cheng Hu
Ge-Peng Ji
Dian Shao
Deng-Ping Fan
25
0
0
15 Apr 2025
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li
Xingxuan Zhang
Hao Zou
Yige Guo
Renzhe Xu
Yilong Liu
Chuzhao Zhu
Yue He
Peng Cui
VLM
39
0
0
14 Apr 2025
DefMamba: Deformable Visual State Space Model
DefMamba: Deformable Visual State Space Model
Leiye Liu
Miao Zhang
Jihao Yin
Tingwei Liu
Wei Ji
Yongri Piao
Huchuan Lu
Mamba
55
0
0
08 Apr 2025
RCCFormer: A Robust Crowd Counting Network Based on Transformer
RCCFormer: A Robust Crowd Counting Network Based on Transformer
Peng Liu
Heng Li
Sen Lei
Nanqing Liu
Bin Feng
Xiao Wu
31
0
0
07 Apr 2025
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo Yin
Jiao-Long Cao
Ming-Ming Cheng
Qibin Hou
3DPC
MDE
48
0
0
07 Apr 2025
EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively
EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively
Bingyang Wang
Kaer Huang
Bin Li
Yiqiang Yan
L. Zhang
Huchuan Lu
You He
VLM
37
0
0
07 Apr 2025
Marine Saliency Segmenter: Object-Focused Conditional Diffusion with Region-Level Semantic Knowledge Distillation
Marine Saliency Segmenter: Object-Focused Conditional Diffusion with Region-Level Semantic Knowledge Distillation
Laibin Chang
Yunke Wang
JiaXing Huang
Longxiang Deng
Bo Du
Chang Xu
DiffM
55
0
0
03 Apr 2025
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
J. Wang
Jingyuan Liu
Xin Sun
Krishna Kumar Singh
Zhixin Shu
...
Nanxuan Zhao
Tuanfeng Y. Wang
Simon Chen
Ulrich Neumann
Jae Shin Yoon
29
0
0
03 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
82
0
0
03 Apr 2025
MDP: Multidimensional Vision Model Pruning with Latency Constraint
MDP: Multidimensional Vision Model Pruning with Latency Constraint
Xinglong Sun
Barath Lakshmanan
Maying Shen
Shiyi Lan
Jingde Chen
Jose M. Alvarez
VLM
49
0
0
02 Apr 2025
CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts for Video Camouflaged Object Detection
CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts for Video Camouflaged Object Detection
Xin Zhang
Keren Fu
Qijun Zhao
VGen
34
1
0
01 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
48
0
0
01 Apr 2025
LSNet: See Large, Focus Small
LSNet: See Large, Focus Small
Ao Wang
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
42
0
0
29 Mar 2025
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li
Tanzhe Li
Xiaobin Hu
Donghao Luo
Taisong Jin
66
0
0
19 Mar 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu
Liying Yang
Peike Li
Dadong Wang
Lincheng Li
Xin Yu
VOS
99
0
0
17 Mar 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du
Guangyao Li
Chang Zhou
Chunjie Zhang
Alan Zhao
D. Hu
54
0
0
17 Mar 2025
3D Hierarchical Panoptic Segmentation in Real Orchard Environments Across Different Sensors
3D Hierarchical Panoptic Segmentation in Real Orchard Environments Across Different Sensors
Matteo Sodano
Federico Magistri
E. Marks
Fares Hosn
Aibek Zurbayev
Rodrigo Marcuzzi
Meher V. R. Malladi
Jens Behley
C. Stachniss
41
0
0
17 Mar 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
65
0
0
17 Mar 2025
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Jiaming Liu
Linghe Kong
Guihai Chen
73
0
0
08 Mar 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
115
1
0
27 Feb 2025
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Xiaoyi Qu
David Aponte
Colby R. Banbury
Daniel P. Robinson
Tianyu Ding
K. Koishida
Ilya Zharkov
Tianyi Chen
MQ
59
1
0
23 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
45
0
0
11 Feb 2025
iFormer: Integrating ConvNet and Transformer for Mobile Application
iFormer: Integrating ConvNet and Transformer for Mobile Application
Chuanyang Zheng
ViT
72
0
0
26 Jan 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
142
0
0
25 Jan 2025
ORCAst: Operational High-Resolution Current Forecasts
ORCAst: Operational High-Resolution Current Forecasts
Pierre Garcia
Inès Larroche
Amélie Pesnec
Hannah Bull
Théo Archambault
Evangelos Moschos
Alexandre Stegner
A. Charantonis
Dominique Béréziat
AI4Cl
45
0
0
21 Jan 2025
A Survey on Deep Learning for Polyp Segmentation: Techniques, Challenges and Future Trends
A Survey on Deep Learning for Polyp Segmentation: Techniques, Challenges and Future Trends
Jiaxin Mei
Tao Zhou
Kaiwen Huang
Yizhe Zhang
Yi Zhou
Ye Wu
Huazhu Fu
153
12
0
20 Jan 2025
MARIO: A Mixed Annotation Framework For Polyp Segmentation
MARIO: A Mixed Annotation Framework For Polyp Segmentation
Haoyang Li
Y. Hu
Jun Wei
Zhen Li
31
0
0
19 Jan 2025
Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain
Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain
Juntao Zhang
Kun Bian
Peng Cheng
You Zhou
Jianning Liu
Wenbo An
Jun Zhou
Kun Shao
Mamba
52
2
0
08 Jan 2025
A Separable Self-attention Inspired by the State Space Model for Computer Vision
Juntao Zhang
Shaogeng Liu
Kun Bian
You Zhou
Pei Zhang
Jianning Liu
Jun Zhou
Bingyan Liu
Mamba
50
0
0
03 Jan 2025
SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network
SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network
Zhaoxu Li
Wei An
Gaowei Guo
Longguang Wang
Yingqian Wang
Zaiping Lin
ViT
85
0
0
03 Jan 2025
Boosting Adversarial Transferability with Spatial Adversarial Alignment
Zhaoyu Chen
Haijing Guo
Kaixun Jiang
Jiyuan Fu
Xinyu Zhou
Dingkang Yang
H. Tang
Bo-wen Li
Wenqiang Zhang
AAML
38
0
0
03 Jan 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
149
611
0
31 Dec 2024
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection
Y. Li
X. Li
Yunheng Li
Y. Zhang
Yimian Dai
Qibin Hou
Ming-Ming Cheng
Jian Yang
26
6
0
31 Dec 2024
MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation
  via Hierarchical Modality Selection
MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation via Hierarchical Modality Selection
Xu Zheng
Yuanhuiyi Lyu
Lutao Jiang
Jiazhou Zhou
Lin Wang
Xuming Hu
74
4
0
22 Dec 2024
First-frame Supervised Video Polyp Segmentation via Propagative and
  Semantic Dual-teacher Network
First-frame Supervised Video Polyp Segmentation via Propagative and Semantic Dual-teacher Network
Qiang Hu
Mei Liu
Qiang Li
Zhiwei Wang
72
0
0
21 Dec 2024
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu
Meng Lou
Yizhou Yu
112
1
0
16 Dec 2024
Unconstrained Salient and Camouflaged Object Detection
Unconstrained Salient and Camouflaged Object Detection
Zhangjun Zhou
Yiping Li
Chunlin Zhong
Jianuo Huang
Jialun Pei
He Tang
84
0
0
14 Dec 2024
Video Diffusion Transformers are In-Context Learners
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
190
2
0
14 Dec 2024
1234...91011
Next