I Dropped a Neural Net
- OOD
A recent Dwarkesh Patel podcast with John Collison and Elon Musk featured an interesting puzzle from Jane Street: they trained a neural net, shuffled all 96 layers, and asked to put them back in order.Given unlabelled layers of a Residual Network and its training dataset, we recover the exact ordering of the layers. The problem decomposes into pairing each block's input and output projections ( possibilities) and ordering the reassembled blocks ( possibilities), for a combined search space of , which is more than the atoms in the observable universe. We show that stability conditions during training like dynamic isometry leave the product for correctly paired layers with a negative diagonal structure, allowing us to use diagonal dominance ratio as a signal for pairing. For ordering, we seed-initialize with a rough proxy such as delta-norm or then hill-climb to zero mean squared error.
View on arXiv