55
0

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition

Abstract

Skeleton-based gesture recognition methods have achieved high success using Graph Convolutional Network (GCN). In addition, context-dependent adaptive topology as a neighborhood vertex information and attention mechanism leverages a model to better represent actions. In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN to effectively improve modeling ability to achieve state-of-the-art results on several datasets. We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node. These two are followed by multi-scale convolution network with dilations, which not only captures the long-range temporal dependencies of joints but also the long-range spatial dependencies (i.e., long-distance dependencies) of node temporal behaviors. They are combined into high-level spatial-temporal representations and output the predicted action with the softmax classifier.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.