ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.08742
15
24

Estimating the Algorithmic Variance of Randomized Ensembles via the Bootstrap

20 July 2019
Miles E. Lopes
ArXivPDFHTML
Abstract

Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is "large enough" --- so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of "algorithmic variance" (i.e. the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests, and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable ErrtErr_tErrt​ denote the prediction error of a randomized ensemble of size ttt. Working under a "first-order model" for randomized ensembles, we prove that the centered law of ErrtErr_tErrt​ can be consistently approximated via the proposed method as t→∞t\to\inftyt→∞. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of ErrtErr_tErrt​ are negligible.

View on arXiv
Comments on this paper