38

I Dropped a Neural Net

Hyunwoo Park
Main:9 Pages
10 Figures
Bibliography:1 Pages
3 Tables
Appendix:3 Pages
Abstract

A recent Dwarkesh Patel podcast with John Collison and Elon Musk featured an interesting puzzle from Jane Street: they trained a neural net, shuffled all 96 layers, and asked to put them back in order.Given unlabelled layers of a Residual Network and its training dataset, we recover the exact ordering of the layers. The problem decomposes into pairing each block's input and output projections (48!48! possibilities) and ordering the reassembled blocks (48!48! possibilities), for a combined search space of (48!)210122(48!)^2 \approx 10^{122}, which is more than the atoms in the observable universe. We show that stability conditions during training like dynamic isometry leave the product WoutWinW_{\text{out}} W_{\text{in}} for correctly paired layers with a negative diagonal structure, allowing us to use diagonal dominance ratio as a signal for pairing. For ordering, we seed-initialize with a rough proxy such as delta-norm or WoutF\|W_{\text{out}}\|_F then hill-climb to zero mean squared error.

View on arXiv
Comments on this paper