Fast Attention Requires Bounded Entries

26 February 2023

Papers citing "Fast Attention Requires Bounded Entries"

14 / 14 papers shown

Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions Jerry Yao-Chieh Hu Xiwen Zhang Maojiang Su Zhao Song Han Liu MLT 108 1 0 26 May 2025
Attention Condensation via Sparsity Induced Regularized Training Eli Sason Darya Frolova Boris Nazarov Felix Goldberd 397 0 0 03 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song Yufa Zhou 112 18 0 21 Feb 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time Yifang Chen Jiayan Huo Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 87 13 0 03 Jan 2025
HSR-Enhanced Sparse Attention Acceleration Bo Chen Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song 142 20 0 14 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers Josh Alman Hantao Yu 43 2 0 05 Oct 2024
Differentially Private Kernel Density Estimation Erzhi Liu Jerry Yao-Chieh Hu Alex Reneau Zhao Song Han Liu 84 3 0 03 Sep 2024
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices Stanislav Budzinskiy 87 2 0 03 Jul 2024
Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention Yichuan Deng Zhao Song Dinesh Manocha 28 13 0 18 Oct 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time Yeqi Gao Zhao Song Weixin Wang Junze Yin 42 26 0 14 Sep 2023
Differentially Private Attention Computation Yeqi Gao Zhao Song Xin Yang 64 21 0 08 May 2023
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos Katharopoulos Apoorv Vyas Nikolaos Pappas Franccois Fleuret 105 1,734 0 29 Jun 2020
Linformer: Self-Attention with Linear Complexity Sinong Wang Belinda Z. Li Madian Khabsa Han Fang Hao Ma 157 1,678 0 08 Jun 2020
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel Yao-Hung Hubert Tsai Shaojie Bai M. Yamada Louis-Philippe Morency Ruslan Salakhutdinov 91 251 0 30 Aug 2019