22
0

Mitigating sampling bias in risk-based active learning via an EM algorithm

Abstract

Risk-based active learning is an approach to developing statistical classifiers for online decision-support. In this approach, data-label querying is guided according to the expected value of perfect information for incipient data points. For SHM applications, the value of information is evaluated with respect to a maintenance decision process, and the data-label querying corresponds to the inspection of a structure to determine its health state. Sampling bias is a known issue within active-learning paradigms; this occurs when an active learning process over- or undersamples specific regions of a feature-space, thereby resulting in a training set that is not representative of the underlying distribution. This bias ultimately degrades decision-making performance, and as a consequence, results in unnecessary costs incurred. The current paper outlines a risk-based approach to active learning that utilises a semi-supervised Gaussian mixture model. The semi-supervised approach counteracts sampling bias by incorporating pseudo-labels for unlabelled data via an EM algorithm. The approach is demonstrated on a numerical example representative of the decision processes found in SHM.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.