91
0
v1v2v3 (latest)

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

Main:15 Pages
5 Figures
Bibliography:1 Pages
5 Tables
Appendix:2 Pages
Abstract

Transformers exhibit proficiency in capturing long-range dependencies, whereas State Space Models (SSMs) facilitate linear-time sequence modeling. Notwithstanding their synergistic potential, the integration of these architectures presents a significant challenge, primarily attributable to a fundamental incongr inuity their respective positional encoding mechanisms: Transformers rely on explicit Rotary Position Embeddings (RoPE), while SSMs leverage implicit positional representations via convolutions. This divergence often precipitates discontinuities and suboptimalthis http URLaddress this impediment, we propose a unified rotary position embedding (Unified RoPE) methodology, thereby establishing a consistent positional encoding framework for both self-attention and state-space components. Using this Unified RoPE, we introduce TransXSSM, a hybrid architecture that coherently integrates the Transformer and SSM layers under this unified positional encoding scheme. At a 4 sequenceK length, TransXSSM exhibits training and inference speeds that are 42.3% and 29.5% faster, respectively, relative to standard Transformer models. It also delivers higher accuracy: under comparable settings, it surpasses a Transformer baseline by over 4% on language modelingthis http URLfurthermore scales more effectively: TransXSSM-1.3B gains 7.22% in average accuracy over its 320M version (versus about 6% gains for equivalent Transformers or SSMs). Our results show that unified positional encoding resolves positional incompatibility in hybrid models, enabling efficient, high-performance long-context modeling.

View on arXiv
@article{wu2025_2506.09507,
  title={ TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding },
  author={ Bingheng Wu and Jingze Shi and Yifan Wu and Nan Tang and Yuyu Luo },
  journal={arXiv preprint arXiv:2506.09507},
  year={ 2025 }
}
Comments on this paper