In order to formally understand the power of neural computing, we first need to crack the frontier of threshold circuits with two and three layers, a regime that has been surprisingly intractable to analyze. We prove the first super-linear gate lower bounds and the first super-quadratic wire lower bounds for depth-two linear threshold circuits with arbitrary weights, and depth-three majority circuits computing an explicit function. We prove that for all , the linear-time computable Andreev's function cannot be computed on a -fraction of -bit inputs by depth-two linear threshold circuits of gates, nor can it be computed with wires. This establishes an average-case ``size hierarchy'' for threshold circuits, as Andreev's function is computable by uniform depth-two circuits of linear threshold gates, and by uniform depth-three circuits of majority gates. We present a new function in based on small-biased sets, which we prove cannot be computed by a majority vote of depth-two linear threshold circuits with gates, nor with wires. We give tight average-case (gate and wire) complexity results for computing PARITY with depth-two threshold circuits; the answer turns out to be the same as for depth-two majority circuits. The key is a new random restriction lemma for linear threshold functions. Our main analytical tool is the Littlewood-Offord Lemma from additive combinatorics.
View on arXiv