PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation

17 June 2025

Main:17 Pages

4 Figures

3 Tables

Abstract

Existing monocular 3D pose estimation methods primarily rely on joint positional features, while overlooking intrinsic directional and angular correlations within the skeleton. As a result, they often produce implausible poses under joint occlusions or rapid motion changes. To address these challenges, we propose the PoseGRAF framework. We first construct a dual graph convolutional structure that separately processes joint and bone graphs, effectively capturing their local dependencies. A Cross-Attention module is then introduced to model interdependencies between bone directions and joint features. Building upon this, a dynamic fusion module is designed to adaptively integrate both feature types by leveraging the relational dependencies between joints and bones. An improved Transformer encoder is further incorporated in a residual manner to generate the final output. Experimental results on the Human3.6M and MPI-INF-3DHP datasets show that our method exceeds state-of-the-art approaches. Additional evaluations on in-the-wild videos further validate its generalizability. The code is publicly available atthis https URL.

View on arXiv

@article{xu2025_2506.14596,
  title={ PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation },
  author={ Ming Xu and Xu Zhang },
  journal={arXiv preprint arXiv:2506.14596},
  year={ 2025 }
}

Comments on this paper