BEAR: A Video Dataset For Fine-grained Behaviors Recognition Oriented with Action and Environment Factors

Behavior recognition is an important task in video representation learning. An essential aspect pertains to effective feature learning conducive to behavior recognition. Recently, researchers have started to study fine-grained behavior recognition, which provides similar behaviors and encourages the model to concern with more details of behaviors with effective features for distinction. However, previous fine-grained behaviors limited themselves to controlling partial information to be similar, leading to an unfair and not comprehensive evaluation of existing works. In this work, we develop a new video fine-grained behavior dataset, named BEAR, which provides fine-grained (i.e. similar) behaviors that uniquely focus on two primary factors defining behavior: Environment and Action. It includes two fine-grained behavior protocols including Fine-grained Behavior with Similar Environments and Fine-grained Behavior with Similar Actions as well as multiple sub-protocols as different scenarios. Furthermore, with this new dataset, we conduct multiple experiments with different behavior recognition models. Our research primarily explores the impact of input modality, a critical element in studying the environmental and action-based aspects of behavior recognition. Our experimental results yield intriguing insights that have substantial implications for further research endeavors.
View on arXiv@article{hu2025_2503.20209, title={ BEAR: A Video Dataset For Fine-grained Behaviors Recognition Oriented with Action and Environment Factors }, author={ Chengyang Hu and Yuduo Chen and Lizhuang Ma }, journal={arXiv preprint arXiv:2503.20209}, year={ 2025 } }