Joint Temporal Pooling for Improving Skeleton-based Action Recognition

International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2023

18 August 2024

Shanaka Ramesh Gunasekara

Wanqing Li

Jack Yang

P. Ogunbona

ArXiv (abs)PDF HTML Github

Main:6 Pages

6 Figures

Bibliography:2 Pages

4 Tables

Abstract

In skeleton-based human action recognition, temporal pooling is a critical step for capturing spatiotemporal relationship of joint dynamics. Conventional pooling methods overlook the preservation of motion information and treat each frame equally. However, in an action sequence, only a few segments of frames carry discriminative information related to the action. This paper presents a novel Joint Motion Adaptive Temporal Pooling (JMAP) method for improving skeleton-based action recognition. Two variants of JMAP, frame-wise pooling and joint-wise pooling, are introduced. The efficacy of JMAP has been validated through experiments on the popular NTU RGB+D 120 and PKU-MMD datasets.

View on arXiv

Comments on this paper