Boosting Simple Learners
We study boosting algorithms under the assumption that the given weak learner outputs hypotheses from a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are ``rules-of-thumbs'' from an ``easy-to-learn class''. Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: we show that this restriction allows to circumvent a lower bound due to Freund and Schapire (`95, `12) on the number of calls to the weak learner. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, in order to circumvent the lower bound we propose a boosting procedure that uses more complex aggregation rules, and we show this to be necessary. (ii) Expressivity: Which tasks can be learned by boosting a base-class of bounded VC-dimension? Can concepts that are ``far away'' from the class be learned? Towards answering this question we identify a combinatorial-geometric parameter which quantifies the expressivity of a base-class when used as part of a boosting procedure. Along the way, we establish and exploit connections with Discrepancy Theory.
View on arXiv