On Implications of Scaling Laws on Feature Superposition

1 July 2024

Pavan Katta

Abstract

Using results from scaling laws, this theoretical note argues that the following two statements cannot be simultaneously true: 1. Superposition hypothesis where sparse features are linearly represented across a layer is a complete theory of feature representation. 2. Features are universal, meaning two models trained on the same data and achieving equal performance will learn identical features.

View on arXiv

Comments on this paper