TSM-Pose: Topology-Aware Learning with Semantic Mamba for Category-Level Object Pose Estimation

18 April 2026

Jinshuo Liu

Bingtao Ma

Junlin Su

Guanyuan Pan

Beining Wu

Cheng Yang

Jiaxuan Lu

Chenggang Yan

Shuai Wang

Mamba

ArXiv (abs)PDF HTML Github

Main:7 Pages

4 Figures

Bibliography:2 Pages

8 Tables

Appendix:3 Pages

Abstract

Category-level object pose estimation is fundamental for embodied intelligence, yet achieving robust generalization to unseen instances remains challenging. However, existing methods mainly rely on simple feature extraction and aggregation, which struggle to capture category-shared topological structures and conduct semantic keypoint modeling, limiting their generalization. To address these, we propose a \textbf{T}opology-Aware Learning with \textbf{S}emantic \textbf{M}amba for Category-Level \textbf{P}ose Estimation framework (TSM-Pose). Specifically, we introduce a Topology Extractor to capture the global topological representation of the point cloud, which is integrated into local geometry features and enables robust category-level structural representation. Simultaneously, we propose a Mamba-based Global Semantic Aggregator that injects semantics priors into keypoints to enhance their expressiveness and leverages multiple TwinMamba blocks to model long-range dependencies for more effective global feature aggregation. Extensive experiments on three benchmark datasets (REAL275, CAMERA25, and HouseCat6D) demonstrate that TSM-Pose outperforms existing state-of-the-art methods.

View on arXiv

Comments on this paper