Exchangeable random variables form an important and well-studied generalization of i.i.d. variables, however simple examples show that no nontrivial concept or function classes are PAC learnable under general exchangeable data inputs . Inspired by the work of Berti and Rigo on a Glivenko--Cantelli theorem for exchangeable inputs, we propose a new paradigm, adequate for learning from exchangeable data: predictive PAC learnability. A learning rule for a function class is predictive PAC if for every and each function , whenever , we have with confidence that the expected difference between and the image of under does not exceed conditionally on . Thus, instead of learning the function as such, we are learning to a given accuracy the predictive behaviour of at the future points , of the sample path. Using de Finetti's theorem, we show that if a universally separable function class is distribution-free PAC learnable under i.i.d. inputs, then it is distribution-free predictive PAC learnable under exchangeable inputs, with a slightly worse sample complexity.
View on arXiv