HuMoCon: Concept Discovery for Human Motion Understanding

27 May 2025

Main:8 Pages

12 Figures

Bibliography:4 Pages

7 Tables

Appendix:6 Pages

Abstract

We present HuMoCon, a novel motion-video understanding framework designed for advanced human behavior analysis. The core of our method is a human motion concept discovery framework that efficiently trains multi-modal encoders to extract semantically meaningful and generalizable features. HuMoCon addresses key challenges in motion concept discovery for understanding and reasoning, including the lack of explicit multi-modality feature alignment and the loss of high-frequency information in masked autoencoding frameworks. Our approach integrates a feature alignment strategy that leverages video for contextual understanding and motion for fine-grained interaction modeling, further with a velocity reconstruction mechanism to enhance high-frequency feature expression and mitigate temporal over-smoothing. Comprehensive experiments on standard benchmarks demonstrate that HuMoCon enables effective motion concept discovery and significantly outperforms state-of-the-art methods in training large models for human motion understanding. We will open-source the associated code with our paper.

View on arXiv

@article{fang2025_2505.20920,
  title={ HuMoCon: Concept Discovery for Human Motion Understanding },
  author={ Qihang Fang and Chengcheng Tang and Bugra Tekin and Shugao Ma and Yanchao Yang },
  journal={arXiv preprint arXiv:2505.20920},
  year={ 2025 }
}

Comments on this paper