ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.12122
  4. Cited By
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

24 February 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
    ViT
ArXivPDFHTML

Papers citing "Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions"

50 / 518 papers shown
Title
Dual-level Fuzzy Learning with Patch Guidance for Image Ordinal Regression
Dual-level Fuzzy Learning with Patch Guidance for Image Ordinal Regression
Chunlai Dong
Haochao Ying
Qibo Qiu
Jinhong Wang
D. Z. Chen
J. Wu
41
0
0
09 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
Image Recognition with Online Lightweight Vision Transformer: A Survey
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
W. Xu
Shibiao Xu
ViT
139
0
0
06 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
51
0
0
05 May 2025
Token Coordinated Prompt Attention is Needed for Visual Prompting
Token Coordinated Prompt Attention is Needed for Visual Prompting
Zichen Liu
Xu Zou
Gang Hua
Jiahuan Zhou
34
0
0
05 May 2025
Always Skip Attention
Always Skip Attention
Yiping Ji
Hemanth Saratchandran
Peyman Moghaddam
Simon Lucey
139
0
0
04 May 2025
HMPE:HeatMap Embedding for Efficient Transformer-Based Small Object Detection
HMPE:HeatMap Embedding for Efficient Transformer-Based Small Object Detection
YangChen Zeng
ViT
31
0
0
18 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
79
0
0
03 Apr 2025
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Hao Ai
Kunyi Wang
Zezhou Wang
H. Lu
Jin Tian
Yaxin Luo
Peng-Fei Xing
Jen-Yuan Huang
Huaxia Li
Gen Luo
MLLM
VLM
108
0
0
26 Mar 2025
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Ethan Griffiths
Maryam Haghighat
Simon Denman
Clinton Fookes
Milad Ramezani
3DPC
59
0
0
11 Mar 2025
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Jiaming Liu
Linghe Kong
Guihai Chen
73
0
0
08 Mar 2025
Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations
Junsong Zhang
Chunyu Lin
Zhijie Shen
Lang Nie
K. Liao
Yao Zhao
33
0
0
03 Mar 2025
VRM: Knowledge Distillation via Virtual Relation Matching
VRM: Knowledge Distillation via Virtual Relation Matching
W. Zhang
Fei Xie
Weidong Cai
Chao Ma
71
0
0
28 Feb 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
115
1
0
27 Feb 2025
Road Traffic Sign Recognition method using Siamese network Combining Efficient-CNN based Encoder
Road Traffic Sign Recognition method using Siamese network Combining Efficient-CNN based Encoder
Zhenghao Xi
Yuchao Shao
Yang Zheng
Xiang Liu
Yaqi Liu
Yitong Cai
55
0
0
24 Feb 2025
Enhancing Vehicle Make and Model Recognition with 3D Attention Modules
Enhancing Vehicle Make and Model Recognition with 3D Attention Modules
Narges Semiromizadeh
Omid Nejati Manzari
S. B. Shokouhi
S. Mirzakuchaki
ViT
86
0
0
24 Feb 2025
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Yongqi Dong
Xingmin Lu
Ruohan Li
Wei Song
B. Arem
Haneen Farah
ViT
107
1
0
21 Feb 2025
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
Shilong Yang
Qi Zang
Chulong Zhang
Lingfeng Huang
Yaoqin Xie
Mamba
63
1
0
16 Feb 2025
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri
Carola-Bibiane Schönlieb
Bruno Ribeiro
Chaim Baskin
Moshe Eliasof
41
0
0
09 Feb 2025
Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection
Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection
Xiangyu Gao
Yu Dai
Benliu Qiu
Hongliang Li
Heqian Qiu
Hongliang Li
ObjD
VLM
136
0
0
28 Jan 2025
iFormer: Integrating ConvNet and Transformer for Mobile Application
iFormer: Integrating ConvNet and Transformer for Mobile Application
Chuanyang Zheng
ViT
72
0
0
26 Jan 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
136
0
0
25 Jan 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Jinwei Gu
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
146
0
0
21 Jan 2025
DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s)
DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s)
Yun Su Jeong
Hye Bin Yoo
Il Yong Chun
DiffM
MedIm
61
0
0
20 Jan 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Z. Chen
Mingxiao Li
Z. Chen
Nan Du
Xiaolong Li
Yuexian Zou
53
0
0
19 Jan 2025
Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution
Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution
Cuixin Yang
Rongkang Dong
Jun Xiao
Cong Zhang
Kin-Man Lam
Fei Zhou
Guoping Qiu
87
1
0
17 Jan 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
67
2
0
14 Jan 2025
Causal Deep Learning
Causal Deep Learning
M. Alex O. Vasilescu
CML
54
2
1
03 Jan 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
149
611
0
31 Dec 2024
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu
Meng Lou
Yizhou Yu
112
1
0
16 Dec 2024
Video Diffusion Transformers are In-Context Learners
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
187
2
0
14 Dec 2024
Breaking the Low-Rank Dilemma of Linear Attention
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
40
0
0
12 Nov 2024
Task Consistent Prototype Learning for Incremental Few-shot Semantic
  Segmentation
Task Consistent Prototype Learning for Incremental Few-shot Semantic Segmentation
Wenbo Xu
Yanan Wu
Haoran Jiang
Yang Wang
Qiang Wu
Jian Andrew Zhang
CLL
VLM
21
0
0
16 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
54
9
0
16 Oct 2024
HASN: Hybrid Attention Separable Network for Efficient Image
  Super-resolution
HASN: Hybrid Attention Separable Network for Efficient Image Super-resolution
Weifeng Cao
Xiaoyan Lei
Jun Shi
Wanyong Liang
Jie Liu
Zongfei Bai
SupR
26
0
0
13 Oct 2024
SkinMamba: A Precision Skin Lesion Segmentation Architecture with
  Cross-Scale Global State Modeling and Frequency Boundary Guidance
SkinMamba: A Precision Skin Lesion Segmentation Architecture with Cross-Scale Global State Modeling and Frequency Boundary Guidance
Shun Zou
Mingya Zhang
Bingjian Fan
Zhengyi Zhou
Xiuguo Zou
Mamba
26
3
0
17 Sep 2024
Brain-Inspired Stepwise Patch Merging for Vision Transformers
Brain-Inspired Stepwise Patch Merging for Vision Transformers
Yonghao Yu
Dongcheng Zhao
Guobin Shen
Yiting Dong
Yi Zeng
45
0
0
11 Sep 2024
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
Hayeon Jo
Hyesong Choi
Minhee Cho
Dongbo Min
34
1
0
04 Sep 2024
Physically Feasible Semantic Segmentation
Physically Feasible Semantic Segmentation
Shamik Basu
Luc Van Gool
Christos Sakaridis
28
1
0
26 Aug 2024
Accuracy Improvement of Cell Image Segmentation Using Feedback Former
Accuracy Improvement of Cell Image Segmentation Using Feedback Former
Hinako Mitsuoka
Kazuhiro Hotta
ViT
MedIm
34
0
0
23 Aug 2024
MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient
  Semantic Segmentation
MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation
Beoungwoo Kang
Seunghun Moon
Yubin Cho
Hyunwoo Yu
Suk-Ju Kang
ViT
MedIm
24
8
0
14 Aug 2024
MacFormer: Semantic Segmentation with Fine Object Boundaries
MacFormer: Semantic Segmentation with Fine Object Boundaries
Guoan Xu
Wenfeng Huang
Tao Wu
Ligeng Chen
Wenjing Jia
Guangwei Gao
Xiatian Zhu
Stuart W. Perry
36
0
0
11 Aug 2024
Modelling Visual Semantics via Image Captioning to extract Enhanced
  Multi-Level Cross-Modal Semantic Incongruity Representation with Attention
  for Multimodal Sarcasm Detection
Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection
Sajal Aggarwal
Ananya Pandey
Dinesh Kumar Vishwakarma
41
1
0
05 Aug 2024
Depth-Wise Convolutions in Vision Transformers for Efficient Training on
  Small Datasets
Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets
Tianxiao Zhang
Wenju Xu
Bo Luo
Guanghui Wang
ViT
MDE
40
7
0
28 Jul 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
68
1
0
23 Jul 2024
SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams
SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams
Liangyan Jiang
Chuang Zhu
Yanxu Chen
50
2
0
22 Jul 2024
Exploring Deeper! Segment Anything Model with Depth Perception for
  Camouflaged Object Detection
Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection
Zhenni Yu
Xiaoqin Zhang
Li Zhao
Yi Bin
Guobao Xiao
VLM
34
7
0
17 Jul 2024
Hierarchical Separable Video Transformer for Snapshot Compressive
  Imaging
Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
Ping Wang
Yulun Zhang
Lishun Wang
Xin Yuan
ViT
26
1
0
16 Jul 2024
FoodMem: Near Real-time and Precise Food Video Segmentation
FoodMem: Near Real-time and Precise Food Video Segmentation
Ahmad AlMughrabi
Adrián Galán
Ricardo Marques
P. Radeva
VOS
36
1
0
16 Jul 2024
GTPT: Group-based Token Pruning Transformer for Efficient Human Pose
  Estimation
GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
Haonan Wang
Jie Liu
Jie Tang
Gangshan Wu
Bo Xu
Y. Kevin Chou
Yong Wang
ViT
36
2
0
15 Jul 2024
1234...91011
Next