3D Skeleton-Based Action Recognition: A Review

1 June 2025

Main:21 Pages

11 Figures

Bibliography:9 Pages

Abstract

With the inherent advantages of skeleton representation, 3D skeleton-based action recognition has become a prominent topic in the field of computer vision. However, previous reviews have predominantly adopted a model-oriented perspective, often neglecting the fundamental steps involved in skeleton-based action recognition. This oversight tends to ignore key components of skeleton-based action recognition beyond model design and has hindered deeper, more intrinsic understanding of the task. To bridge this gap, our review aims to address these limitations by presenting a comprehensive, task-oriented framework for understanding skeleton-based action recognition. We begin by decomposing the task into a series of sub-tasks, placing particular emphasis on preprocessing steps such as modality derivation and data augmentation. The subsequent discussion delves into critical sub-tasks, including feature extraction and spatio-temporal modeling techniques. Beyond foundational action recognition networks, recently advanced frameworks such as hybrid architectures, Mamba models, large language models (LLMs), and generative models have also been highlighted. Finally, a comprehensive overview of public 3D skeleton datasets is presented, accompanied by an analysis of state-of-the-art algorithms evaluated on these benchmarks. By integrating task-oriented discussions, comprehensive examinations of sub-tasks, and an emphasis on the latest advancements, our review provides a fundamental and accessible structured roadmap for understanding and advancing the field of 3D skeleton-based action recognition.

View on arXiv

@article{liu2025_2506.00915,
  title={ 3D Skeleton-Based Action Recognition: A Review },
  author={ Mengyuan Liu and Hong Liu and Qianshuo Hu and Bin Ren and Junsong Yuan and Jiaying Lin and Jiajun Wen },
  journal={arXiv preprint arXiv:2506.00915},
  year={ 2025 }
}

Comments on this paper