ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.13764
28
49

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

30 March 2020
Anil Armagan
Guillermo Garcia-Hernando
Seungryul Baek
Shreyas Hampali
Mahdi Rad
Zhaohui Zhang
Shipeng Xie
Mingxiu Chen
Boshen Zhang
Fu Xiong
Yang Xiao
Zhiguo Cao
Junsong Yuan
Pengfei Ren
Weiting Huang
Haifeng Sun
M. Hrúz
J. Kanis
Z. Krňoul
Qingfu Wan
Shile Li
Linlin Yang
Dongheui Lee
Angela Yao
Weiguo Zhou
Sijia Mei
Yunhui Liu
Adrian Spurr
Umar Iqbal
Pavlo Molchanov
Philippe Weinzaepfel
Romain Brégier
Grégory Rogez
Vincent Lepetit
Tae-Kyun Kim
    3DH
ArXivPDFHTML
Abstract

We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS'19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS'19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand model to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.

View on arXiv
Comments on this paper