Decomposing a deep neural network's learned representations into interpretable features could greatly enhance its safety and reliability. To better understand features, we adopt a geometric perspective, viewing them as a learned coordinate system for mapping an embedded data distribution. We motivate a model of a generic data distribution as a random lattice and analyze its properties using percolation theory. Learned features are categorized into context, component, and surface features. The model is qualitatively consistent with recent findings in mechanistic interpretability and suggests directions for future research.
View on arXiv@article{brill2025_2504.20197, title={ Representation Learning on a Random Lattice }, author={ Aryeh Brill }, journal={arXiv preprint arXiv:2504.20197}, year={ 2025 } }