Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.14023
Cited By
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
26 July 2023
T. Kajitsuka
Issei Sato
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?"
18 / 18 papers shown
Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Jerry Yao-Chieh Hu
Xiwen Zhang
Maojiang Su
Zhao Song
Han Liu
MLT
126
1
0
26 May 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
66
10
0
03 Jan 2025
Attention layers provably solve single-location regression
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
346
4
0
02 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
74
4
0
02 Oct 2024
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
Naoya Hasegawa
Issei Sato
72
0
0
26 Sep 2024
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
84
3
0
03 Sep 2024
Memorization Capacity of Multi-Head Attention in Transformers
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
66
24
0
03 Jun 2023
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Sophie Hao
Dana Angluin
Robert Frank
33
77
0
13 Apr 2022
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
56
81
0
24 Feb 2022
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
59
34
0
07 Jun 2021
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan
Luke Zettlemoyer
Sonal Gupta
78
549
1
22 Dec 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
170
1,678
0
08 Jun 2020
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
42
95
0
17 Feb 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
44
267
0
16 Jun 2019
Reconciling modern machine learning practice and the bias-variance trade-off
M. Belkin
Daniel J. Hsu
Siyuan Ma
Soumik Mandal
174
1,628
0
28 Dec 2018
The Expressive Power of Neural Networks: A View from the Width
Zhou Lu
Hongming Pu
Feicheng Wang
Zhiqiang Hu
Liwei Wang
67
886
0
08 Sep 2017
Identity Matters in Deep Learning
Moritz Hardt
Tengyu Ma
OOD
61
399
0
14 Nov 2016
1