18
0

CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection

Hongyi Cai
Mohammad Mahdinur Rahman
Jingyu Wu
Yulun Deng
Abstract

Feature pyramids have been widely adopted in convolutional neural networks and transformers for tasks in medical image segmentation. However, existing models generally focus on the Encoder-side Transformer for feature extraction. We further explore the potential in improving the feature decoder with a well-designed architecture. We propose Cross Feature Pyramid Transformer decoder (CFPFormer), a novel decoder block that integrates feature pyramids and transformers. Even though transformer-like architecture impress with outstanding performance in segmentation, the concerns to reduce the redundancy and training costs still exist. Specifically, by leveraging patch embedding, cross-layer feature concatenation mechanisms, CFPFormer enhances feature extraction capabilities while complexity issue is mitigated by our Gaussian Attention. Benefiting from Transformer structure and U-shaped connections, our work is capable of capturing long-range dependencies and effectively up-sample feature maps. Experimental results are provided to evaluate CFPFormer on medical image segmentation datasets, demonstrating the efficacy and effectiveness. With a ResNet50 backbone, our method achieves 92.02\% Dice Score, highlighting the efficacy of our methods. Notably, our VGG-based model outperformed baselines with more complex ViT and Swin Transformer backbone.

View on arXiv
@article{cai2025_2404.15451,
  title={ CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection },
  author={ Hongyi Cai and Mohammad Mahdinur Rahman and Wenzhen Dong and Jingyu Wu },
  journal={arXiv preprint arXiv:2404.15451},
  year={ 2025 }
}
Comments on this paper