5
0

Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity

Oluwadamilola Fasina
Ruben V.C. Pohle
Pei-Chun Su
Ronald R. Coifman
Author Contacts:
Main:10 Pages
6 Figures
Bibliography:3 Pages
2 Tables
Appendix:3 Pages
Abstract

We examine the intrinsic (within the attention head) and extrinsic (amongst the attention heads) structure of the self-attention mechanism in transformers. Theoretical evidence for invariance of the self-attention mechanism to softmax activation is obtained by appealing to paradifferential calculus, (and is supported by computational examples), which relies on the intrinsic organization of the attention heads. Furthermore, we use an existing methodology for hierarchical organization of tensors to examine network structure by constructing hierarchal partition trees with respect to the query, key, and head axes of network 3-tensors. Such an organization is consequential since it allows one to profitably execute common signal processing tasks on a geometry where the organized network 3-tensors exhibit regularity. We exemplify this qualitatively, by visualizing the hierarchical organization of the tree comprised of attention heads and the diffusion map embeddings, and quantitatively by investigating network sparsity with the expansion coefficients of individual attention heads and the entire network with respect to the bi and tri-haar bases (respectively) on the space of queries, keys, and heads of the network. To showcase the utility of our theoretical and methodological findings, we provide computational examples using vision and language transformers. The ramifications of these findings are two-fold: (1) a subsequent step in interpretability analysis is theoretically admitted, and can be exploited empirically for downstream interpretability tasks (2) one can use the network 3-tensor organization for empirical network applications such as model pruning (by virtue of network sparsity) and network architecture comparison.

View on arXiv
@article{fasina2025_2506.15541,
  title={ Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity },
  author={ Oluwadamilola Fasina and Ruben V.C. Pohle and Pei-Chun Su and Ronald R. Coifman },
  journal={arXiv preprint arXiv:2506.15541},
  year={ 2025 }
}
Comments on this paper