12
16

A Direct Sum Result for the Information Complexity of Learning

Abstract

How many bits of information are required to PAC learn a class of hypotheses of VC dimension dd? The mathematical setting we follow is that of Bassily et al. (2018), where the value of interest is the mutual information I(S;A(S))\mathrm{I}(S;A(S)) between the input sample SS and the hypothesis outputted by the learning algorithm AA. We introduce a class of functions of VC dimension dd over the domain X\mathcal{X} with information complexity at least Ω(dloglogXd)\Omega\left(d\log \log \frac{|\mathcal{X}|}{d}\right) bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case d=1d=1. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of information complexity of a given class of functions H\mathcal{H}. Intuitively, it is the minimum amount of information that an algorithm for H\mathcal{H} must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes.

View on arXiv
Comments on this paper