The state-of-the-art boosting implementations, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing boosted trees. which achieves a significant speedup over XGBoost and LightGBM, especially when memory size is small. This is achieved using a combination of three techniques: effective sample size, early stopping, and stratified sampling, which are explained and analyzed in the paper. We describe our implementation and present experimental results to support our claims.
View on arXiv