Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.12122
Cited By
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
24 February 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions"
50 / 594 papers shown
Title
Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
Ping Wang
Yulun Zhang
Lishun Wang
Xin Yuan
ViT
31
1
0
16 Jul 2024
FoodMem: Near Real-time and Precise Food Video Segmentation
Ahmad AlMughrabi
Adrián Galán
Ricardo Marques
P. Radeva
VOS
38
1
0
16 Jul 2024
GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
Haonan Wang
Jie Liu
Jie Tang
Gangshan Wu
Bo Xu
Y. Kevin Chou
Yong Wang
ViT
36
2
0
15 Jul 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
45
56
0
10 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
46
4
0
10 Jul 2024
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
Omar S. El-Assiouti
Ghada Hamed
Dina Khattab
H. M. Ebied
42
1
0
10 Jul 2024
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang
Yulun Zhang
Fisher Yu
42
15
0
08 Jul 2024
Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection
F. Barbato
Umberto Michieli
J. Moon
Pietro Zanuttigh
Mete Ozay
42
2
0
01 Jul 2024
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang
Siyue Yu
Yunchao Wei
Yao Zhao
Jimin Xiao
VLM
35
8
0
17 Jun 2024
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
Xiangyang Yang
Dan Zeng
Xucheng Wang
You Wu
Hengzhou Ye
Qijun Zhao
Shuiwang Li
59
3
0
12 Jun 2024
GrootVL: Tree Topology is All You Need in State Space Model
Yicheng Xiao
Lin Song
Shaoli Huang
Jiangshan Wang
Siyu Song
Yixiao Ge
Xiu Li
Ying Shan
Mamba
44
10
0
04 Jun 2024
Low-Rank Adaption on Transformer-based Oriented Object Detector for Satellite Onboard Processing of Remote Sensing Images
Xinyang Pu
Feng Xu
32
3
0
04 Jun 2024
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
48
5
0
22 May 2024
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
ViT
49
9
0
19 May 2024
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong
You-Chen Liu
Lai Xing Ng
Benoit R. Cottereau
Wei Tsang Ooi
VLM
34
14
0
08 May 2024
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
37
3
0
02 May 2024
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation
Xin Zhang
Liangxiu Han
Tam Sobeih
Lianghao Han
Darren Dancey
58
1
0
26 Apr 2024
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing
Yuang Liu
Zhiheng Qiu
Xiaokai Qin
ViT
31
0
0
20 Apr 2024
Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution
Cansu Korkmaz
A. Murat Tekalp
ViT
44
6
0
17 Apr 2024
PillarTrack:Boosting Pillar Representation for Transformer-based 3D Single Object Tracking on Point Clouds
Weisheng Xu
Sifan Zhou
Jiaqi Xiong
Ziyu Zhao
Zhihang Yuan
45
2
0
11 Apr 2024
LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation
Ngoc-Du Tran
Thi-Thao Tran
Quang-Huy Nguyen
Manh-Hung Vu
Van-Truong Pham
MedIm
ViT
39
1
0
04 Apr 2024
SpiralMLP: A Lightweight Vision MLP Architecture
Haojie Mu
Burhan Ul Tayyab
Nicholas Chua
43
0
0
31 Mar 2024
Efficient Modulation for Vision Networks
Xu Ma
Xiyang Dai
Jianwei Yang
Bin Xiao
Yinpeng Chen
Yun Fu
Lu Yuan
43
17
0
29 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
51
7
0
28 Mar 2024
Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data
Mohamed Harmanani
P. Wilson
Fahimeh Fooladgar
A. Jamzad
Mahdi Gilany
Minh Nguyen Nhat To
Brian Wodlinger
Purang Abolmaesumi
P. Mousavi
ViT
MedIm
27
1
0
27 Mar 2024
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
Ting Yao
Yehao Li
Yingwei Pan
Tao Mei
ViT
28
15
0
18 Mar 2024
D-Net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric Medical Image Segmentation
Jin Yang
Peijie Qiu
Yichi Zhang
Daniel S. Marcus
Aristeidis Sotiras
MedIm
46
9
0
15 Mar 2024
Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Jingyun Xue
Tao Wang
Jun Wang
Kaihao Zhang
ViT
48
2
0
09 Mar 2024
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Yuelin Zhang
Pengyu Zheng
Wanquan Yan
Chengyu Fang
Shing Shin Cheng
MedIm
37
7
0
05 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
46
44
0
04 Mar 2024
Zero-shot generalization across architectures for visual classification
Evan Gerritz
Luciano Dyballa
Steven W. Zucker
29
1
0
21 Feb 2024
Learning Pixel-wise Continuous Depth Representation via Clustering for Depth Completion
Shenglun Chen
Hong Zhang
Xinzhu Ma
Zhihui Wang
Haojie Li
29
2
0
21 Feb 2024
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
55
4
0
17 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
73
75
0
03 Feb 2024
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
Mingxin Huang
Dezhi Peng
Hongliang Li
Zhenghao Peng
Chongyu Liu
Dahua Lin
Yuliang Liu
Xiang Bai
Lianwen Jin
77
1
0
15 Jan 2024
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception
Zhijie Shen
Chunyu Lin
Junsong Zhang
Lang Nie
K. Liao
Yao Zhao
28
5
0
26 Dec 2023
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Liang Feng
Ping Luo
16
2
0
20 Dec 2023
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders
Bumsoo Kim
Jinhyung Kim
Yeonsik Jo
S. Kim
VLM
23
3
0
19 Dec 2023
MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
Hao Shao
Quansheng Zeng
Qibin Hou
Jufeng Yang
51
13
0
14 Dec 2023
Diffusion for Natural Image Matting
Yihan Hu
Yiheng Lin
Wei Wang
Yao-Min Zhao
Yunchao Wei
Humphrey Shi
28
7
0
10 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
44
0
0
01 Dec 2023
QuadraNet: Improving High-Order Neural Interaction Efficiency with Hardware-Aware Quadratic Neural Networks
Chenhui Xu
Fuxun Yu
Zirui Xu
Chenchen Liu
Jinjun Xiong
Xiang Chen
33
4
0
29 Nov 2023
HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation
Chengpeng Wu
Guangxing Tan
Chunyu Li
ViT
21
0
0
22 Nov 2023
Zero-Shot Digital Rock Image Segmentation with a Fine-Tuned Segment Anything Model
Zhaoyang Ma
Xupeng He
Shuyu Sun
Bicheng Yan
Hyung Kwak
Jun Gao
23
5
0
17 Nov 2023
Semi-supervised ViT knowledge distillation network with style transfer normalization for colorectal liver metastases survival prediction
Mohamed El Amine Elforaici
E. Montagnon
Francisco Perdigon Romero
W. Le
F. Azzi
Dominique Trudel
Bich Nguyen
Simon Turcotte
An Tang
Samuel Kadoury
MedIm
33
2
0
17 Nov 2023
Dynamic Association Learning of Self-Attention and Convolution in Image Restoration
Kui Jiang
Xuemei Jia
Wenxin Huang
Wenbin Wang
Zheng Wang
Junjun Jiang
20
1
0
09 Nov 2023
Rotation Invariant Transformer for Recognizing Object in UAVs
Shuo Chen
Mang Ye
Bo Du
ViT
32
18
0
05 Nov 2023
PAUMER: Patch Pausing Transformer for Semantic Segmentation
Evann Courdier
Prabhu Teja Sivaprasad
F. Fleuret
37
2
0
01 Nov 2023
Improving Robustness for Vision Transformer with a Simple Dynamic Scanning Augmentation
Shashank Kotyan
Danilo Vasconcellos Vargas
ViT
27
2
0
01 Nov 2023
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
Meng Lou
Hong-Yu Zhou
Sibei Yang
Yizhou Yu
Chuan Wu
Yizhou Yu
ViT
44
36
0
30 Oct 2023
Previous
1
2
3
4
5
...
10
11
12
Next