Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.13214
Cited By
Fast Attention Requires Bounded Entries
26 February 2023
Josh Alman
Zhao Song
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fast Attention Requires Bounded Entries"
14 / 14 papers shown
Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Jerry Yao-Chieh Hu
Xiwen Zhang
Maojiang Su
Zhao Song
Han Liu
MLT
108
1
0
26 May 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
397
0
0
03 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
112
18
0
21 Feb 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
87
13
0
03 Jan 2025
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
142
20
0
14 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers
Josh Alman
Hantao Yu
43
2
0
05 Oct 2024
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
84
3
0
03 Sep 2024
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
Stanislav Budzinskiy
87
2
0
03 Jul 2024
Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Yichuan Deng
Zhao Song
Dinesh Manocha
28
13
0
18 Oct 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
42
26
0
14 Sep 2023
Differentially Private Attention Computation
Yeqi Gao
Zhao Song
Xin Yang
64
21
0
08 May 2023
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
105
1,734
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
157
1,678
0
08 Jun 2020
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
91
251
0
30 Aug 2019
1