33
5

Egocentric Hand-object Interaction Detection and Application

Abstract

In this paper, we present a method to detect the hand-object interaction from an egocentric perspective. In contrast to massive data-driven discriminator based method like \cite{Shan20}, we propose a novel workflow that utilises the cues of hand and object. Specifically, we train networks predicting hand pose, hand mask and in-hand object mask to jointly predict the hand-object interaction status. We compare our method with the most recent work from Shan et al. \cite{Shan20} on selected images from EPIC-KITCHENS \cite{damen2018scaling} dataset and achieve 89%89\% accuracy on HOI (hand-object interaction) detection which is comparative to Shan's (92%92\%). However, for real-time performance, with the same machine, our method can run over 30\textbf{30} FPS which is much efficient than Shan's (12\textbf{1}\sim\textbf{2} FPS). Furthermore, with our approach, we are able to segment script-less activities from where we extract the frames with the HOI status detection. We achieve 68.2%\textbf{68.2\%} and 82.8%\textbf{82.8\%} F1 score on GTEA \cite{fathi2011learning} and the UTGrasp \cite{cai2015scalable} dataset respectively which are all comparative to the SOTA methods.

View on arXiv
Comments on this paper