5
0

Minimalist Softmax Attention Provably Learns Constrained Boolean Functions

Abstract

We study the computational limits of learning kk-bit Boolean functions (specifically, AND\mathrm{AND}, OR\mathrm{OR}, and their noisy variants), using a minimalist single-head softmax-attention mechanism, where k=Θ(d)k=\Theta(d) relevant bits are selected from dd inputs. We show that these simple AND\mathrm{AND} and OR\mathrm{OR} functions are unsolvable with a single-head softmax-attention mechanism alone. However, with teacher forcing, the same minimalist attention is capable of solving them. These findings offer two key insights: Architecturally, solving these Boolean tasks requires only minimalist attention, without deep Transformer blocks or FFNs. Methodologically, one gradient descent update with supervision suffices and replaces the multi-step Chain-of-Thought (CoT) reasoning scheme of [Kim and Suzuki, ICLR 2025] for solving Boolean problems. Together, the bounds expose a fundamental gap between what this minimal architecture achieves under ideal supervision and what is provably impossible under standard training.

View on arXiv
@article{hu2025_2505.19531,
  title={ Minimalist Softmax Attention Provably Learns Constrained Boolean Functions },
  author={ Jerry Yao-Chieh Hu and Xiwen Zhang and Maojiang Su and Zhao Song and Han Liu },
  journal={arXiv preprint arXiv:2505.19531},
  year={ 2025 }
}
Comments on this paper