Automatic Identification of Non-Meaningful Body-Movements and What It Reveals About Humans

We present a framework to identify whether a public speaker's body movements are meaningful or non-meaningful ("Mannerisms") in the context of their speeches. In a dataset of 84 public speaking videos from 28 individuals, we extract 314 unique body movement patterns (e.g. pacing, gesturing, shifting body weights, etc.). Online workers and the speakers themselves annotated the meaningfulness of the patterns. We extracted five types of features from the audio-video recordings: disfluency, prosody, body movements, facial, and lexical. We use linear classifiers to predict the annotations with AUC up to 0.82. Analysis of the classifier weights reveals that it puts larger weights on the lexical features while predicting self-annotations. Contrastingly, it puts a larger weight on prosody features while predicting audience annotations. This analysis might provide subtle hint that public speakers tend to focus more on the verbal features while evaluating self-performances. The audience, on the other hand, tends to focus more on the non-verbal aspects of the speech. The dataset and code associated with this work has been released for peer review and further analysis.
View on arXiv