Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques

9 June 2025

Zikang Leng

Archith Iyer

Thomas Plötz

ArXiv (abs)PDF HTML

Main:6 Pages

2 Figures

Bibliography:2 Pages

4 Tables

Abstract

Human activity recognition (HAR) is often limited by the scarcity of labeled datasets due to the high cost and complexity of real-world data collection. To mitigate this, recent work has explored generating virtual inertial measurement unit (IMU) data via cross-modality transfer. While video-based and language-based pipelines have each shown promise, they differ in assumptions and computational cost. Moreover, their effectiveness relative to traditional sensor-level data augmentation remains unclear. In this paper, we present a direct comparison between these two virtual IMU generation approaches against classical data augmentation techniques. We construct a large-scale virtual IMU dataset spanning 100 diverse activities from Kinetics-400 and simulate sensor signals at 22 body locations. The three data generation strategies are evaluated on benchmark HAR datasets (UTD-MHAD, PAMAP2, HAD-AW) using four popular models. Results show that virtual IMU data significantly improves performance over real or augmented data alone, particularly under limited-data conditions. We offer practical guidance on choosing data generation strategies and highlight the distinct advantages and disadvantages of each approach.

View on arXiv

@article{leng2025_2506.07612,
  title={ Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques },
  author={ Zikang Leng and Archith Iyer and Thomas Plötz },
  journal={arXiv preprint arXiv:2506.07612},
  year={ 2025 }
}

Comments on this paper