Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

26 July 2023

Papers citing "Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?"

18 / 18 papers shown

Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions Jerry Yao-Chieh Hu Xiwen Zhang Maojiang Su Zhao Song Han Liu MLT 126 1 0 26 May 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling Hao Jiang Qianxiao Li 66 10 0 03 Jan 2025
Attention layers provably solve single-location regression Pierre Marion Raphael Berthier Gérard Biau Claire Boyer 346 4 0 02 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding Kevin Xu Issei Sato 74 4 0 02 Oct 2024
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment Naoya Hasegawa Issei Sato 72 0 0 26 Sep 2024
Differentially Private Kernel Density Estimation Erzhi Liu Jerry Yao-Chieh Hu Alex Reneau Zhao Song Han Liu 84 3 0 03 Sep 2024
Memorization Capacity of Multi-Head Attention in Transformers Sadegh Mahdavi Renjie Liao Christos Thrampoulidis 66 24 0 03 Jun 2023
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity Sophie Hao Dana Angluin Robert Frank 33 77 0 13 Apr 2022
Overcoming a Theoretical Limitation of Self-Attention David Chiang Peter A. Cholak 56 81 0 24 Feb 2022
On the Expressive Power of Self-Attention Matrices Valerii Likhosherstov K. Choromanski Adrian Weller 59 34 0 07 Jun 2021
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning Armen Aghajanyan Luke Zettlemoyer Sonal Gupta 78 549 1 22 Dec 2020
Linformer: Self-Attention with Linear Complexity Sinong Wang Belinda Z. Li Madian Khabsa Han Fang Hao Ma 170 1,678 0 08 Jun 2020
Low-Rank Bottleneck in Multi-head Attention Models Srinadh Bhojanapalli Chulhee Yun A. S. Rawat Sashank J. Reddi Sanjiv Kumar 42 95 0 17 Feb 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 408 24,160 0 26 Jul 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models Michael Hahn 44 267 0 16 Jun 2019
Reconciling modern machine learning practice and the bias-variance trade-off M. Belkin Daniel J. Hsu Siyuan Ma Soumik Mandal 174 1,628 0 28 Dec 2018
The Expressive Power of Neural Networks: A View from the Width Zhou Lu Hongming Pu Feicheng Wang Zhiqiang Hu Liwei Wang 67 886 0 08 Sep 2017
Identity Matters in Deep Learning Moritz Hardt Tengyu Ma OOD 61 399 0 14 Nov 2016