The FFT Strikes Again: An Efficient Alternative to Self-Attention

Conventional self-attention mechanisms exhibit quadratic complexity in sequence length, making them challenging to scale for long inputs. We present FFTNet, an adaptive spectral filtering framework that uses the Fast Fourier Transform (FFT) to achieve global token mixing in \(\mathcal{O}(n\log n)\) time. By mapping inputs into the frequency domain, FFTNet exploits orthogonality and energy preservation-guaranteed by Parseval's theorem-to efficiently model long-range dependencies. Our main theoretical contributions include 1) An adaptive spectral filter that highlights salient frequency components, 2) A hybrid scheme combining local windowing with a global FFT branch, 3) Nonlinear feature transformations applied in both the frequency and token domains. Experiments on Long Range Arena and ImageNet validate our theoretical insights and demonstrate superior performance over fixed Fourier-based and standard attention models.
View on arXiv@article{fein-ashley2025_2502.18394, title={ The FFT Strikes Again: An Efficient Alternative to Self-Attention }, author={ Jacob Fein-Ashley and Rajgopal Kannan and Viktor Prasanna }, journal={arXiv preprint arXiv:2502.18394}, year={ 2025 } }