19
0

Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees

Abstract

We give the first algorithm that maintains an approximate decision tree over an arbitrary sequence of insertions and deletions of labeled examples, with strong guarantees on the worst-case running time per update request. For instance, we show how to maintain a decision tree where every vertex has Gini gain within an additive α\alpha of the optimum by performing O(d(logn)4α3)O\Big(\frac{d\,(\log n)^4}{\alpha^3}\Big) elementary operations per update, where dd is the number of features and nn the maximum size of the active set (the net result of the update requests). We give similar bounds for the information gain and the variance gain. In fact, all these bounds are corollaries of a more general result, stated in terms of decision rules -- functions that, given a set SS of labeled examples, decide whether to split SS or predict a label. Decision rules give a unified view of greedy decision tree algorithms regardless of the example and label domains, and lead to a general notion of ϵ\epsilon-approximate decision trees that, for natural decision rules such as those used by ID3 or C4.5, implies the gain approximation guarantees above. The heart of our work provides a deterministic algorithm that, given any decision rule and any ϵ>0\epsilon > 0, maintains an ϵ\epsilon-approximate tree using O ⁣(df(n)npolyhϵ)O\!\left(\frac{d\, f(n)}{n} \operatorname{poly}\frac{h}{\epsilon}\right) operations per update, where f(n)f(n) is the complexity of evaluating the rule over a set of nn examples and hh is the maximum height of the maintained tree.

View on arXiv
Comments on this paper