ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10146
11
9

KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

21 June 2022
Xuanhan Wang
Jingkuan Song
Xiaojia Chen
Lechao Cheng
Lianli Gao
Hengtao Shen
ArXivPDFHTML
Abstract

Part-level attribute parsing is a fundamental but challenging task, which requires the region-level visual understanding to provide explainable details of body parts. Most existing approaches address this problem by adding a regional convolutional neural network (RCNN) with an attribute prediction head to a two-stage detector, in which attributes of body parts are identified from local-wise part boxes. However, local-wise part boxes with limit visual clues (i.e., part appearance only) lead to unsatisfying parsing results, since attributes of body parts are highly dependent on comprehensive relations among them. In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e.g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e.g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining''). Specifically, the KE-RCNN consists of two novel components, i.e., Implicit Knowledge based Encoder (IK-En) and Explicit Knowledge based Decoder (EK-De). The former is designed to enhance part-level representation by encoding part-part relational contexts into part boxes, and the latter one is proposed to decode attributes with a guidance of prior knowledge about \textit{part-attribute} relations. In this way, the KE-RCNN is plug-and-play, which can be integrated into any two-stage detectors, e.g., Attribute-RCNN, Cascade-RCNN, HRNet based RCNN and SwinTransformer based RCNN. Extensive experiments conducted on two challenging benchmarks, e.g., Fashionpedia and Kinetics-TPS, demonstrate the effectiveness and generalizability of the KE-RCNN. In particular, it achieves higher improvements over all existing methods, reaching around 3% of AP on Fashionpedia and around 4% of Acc on Kinetics-TPS.

View on arXiv
Comments on this paper