30
17

Self-training Converts Weak Learners to Strong Learners in Mixture Models

Abstract

We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant Cerr>0C_{\mathrm{err}}>0 such that if a pseudolabeler βpl\boldsymbol{\beta}_{\mathrm{pl}} can achieve classification error at most CerrC_{\mathrm{err}}, then for any ε>0\varepsilon>0, an iterative self-training algorithm initialized at β0:=βpl\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}} using pseudolabels y^=sgn(βt,x)\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle) and using at most O~(d/ε2)\tilde O(d/\varepsilon^2) unlabeled examples suffices to learn the Bayes-optimal classifier up to ε\varepsilon error, where dd is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler βpl\boldsymbol{\beta}_{\mathrm{pl}} with classification error CerrC_{\mathrm{err}} using only O(d)O(d) labeled examples (i.e., independent of ε\varepsilon). Together our results imply that mixture models can be learned to within ε\varepsilon of the Bayes-optimal accuracy using at most O(d)O(d) labeled examples and O~(d/ε2)\tilde O(d/\varepsilon^2) unlabeled examples by way of a semi-supervised self-training algorithm.

View on arXiv
Comments on this paper