538

Kinetic Theory for Residual Neural Networks

Foundations of Data Science (FDS), 2020
Abstract

Deep residual neural networks are performing very well for many data science applications. We use kinetic theory to improve understanding of existing methods. A simplified residual neural network (SimResNet) model, in which each layer consists of one neuron per input dimension at most, is studied in the limit of infinitely many inputs. This leads to a Vlasov type equation for the distribution of data, and we analyze it with respect to sensitivities and steady states. In the simple case of a linear activation function we can study moment model properties for one-dimensional input data. Further, a modification of the microscopic dynamics leads to a Fokker-Planck type formulation of the SimResNet, in which the concept of network training is replaced by the task of fitting distributions. The performed analysis is validated by numerical simulations. In particular, results on clustering and regression problems are presented.

View on arXiv
Comments on this paper