Linear models based on noisy data and the Frisch scheme

We address the problem of identifying linear relations among variables based on noisy measurements. This is, of course, a central question in problems involving "Big Data." Often a key assumption is that measurement errors in each variable are independent. This precise formulation has its roots in the work of Charles Spearman in 1904 and of Ragnar Frisch in the 1930's. Various topics such as errors-in-variables, factor analysis, and instrumental variables, all refer to alternative formulations of the problem of how to account for the anticipated way that noise enters in the data. In the present paper we begin by describing the basic theory and provide alternative modern proofs to some key results. We then go on to consider certain generalizations of the theory as well applying certain novel numerical techniques to the problem. A central role is played by the Frisch-Kalman dictum which aims at a noise contribution that allows a maximal set of simultaneous linear relations among the noise-free variables --a rank minimization problem. In the years since Frisch's original formulation, there have been several insights including trace minimization as a convenient heuristic to replace rank minimization. We discuss convex relaxations and certificates guaranteeing global optimality. A complementary point of view to the Frisch-Kalman dictum is introduced in which models lead to a min-max quadratic estimation error for the error-free variables. Points of contact between the two formalisms are discussed and various alternative regularization schemes are indicated.
View on arXiv