ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.14023
  4. Cited By
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

26 July 2023
T. Kajitsuka
Issei Sato
ArXivPDFHTML

Papers citing "Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?"

18 / 18 papers shown
Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Jerry Yao-Chieh Hu
Xiwen Zhang
Maojiang Su
Zhao Song
Han Liu
MLT
126
1
0
26 May 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
66
10
0
03 Jan 2025
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
346
4
0
02 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
74
4
0
02 Oct 2024
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
Naoya Hasegawa
Issei Sato
72
0
0
26 Sep 2024
Differentially Private Kernel Density Estimation
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
84
3
0
03 Sep 2024
Memorization Capacity of Multi-Head Attention in Transformers
Memorization Capacity of Multi-Head Attention in Transformers
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
66
24
0
03 Jun 2023
Formal Language Recognition by Hard Attention Transformers: Perspectives
  from Circuit Complexity
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Sophie Hao
Dana Angluin
Robert Frank
33
77
0
13 Apr 2022
Overcoming a Theoretical Limitation of Self-Attention
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
56
81
0
24 Feb 2022
On the Expressive Power of Self-Attention Matrices
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
59
34
0
07 Jun 2021
Intrinsic Dimensionality Explains the Effectiveness of Language Model
  Fine-Tuning
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan
Luke Zettlemoyer
Sonal Gupta
78
549
1
22 Dec 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
170
1,678
0
08 Jun 2020
Low-Rank Bottleneck in Multi-head Attention Models
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
42
95
0
17 Feb 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
44
267
0
16 Jun 2019
Reconciling modern machine learning practice and the bias-variance
  trade-off
Reconciling modern machine learning practice and the bias-variance trade-off
M. Belkin
Daniel J. Hsu
Siyuan Ma
Soumik Mandal
174
1,628
0
28 Dec 2018
The Expressive Power of Neural Networks: A View from the Width
The Expressive Power of Neural Networks: A View from the Width
Zhou Lu
Hongming Pu
Feicheng Wang
Zhiqiang Hu
Liwei Wang
67
886
0
08 Sep 2017
Identity Matters in Deep Learning
Identity Matters in Deep Learning
Moritz Hardt
Tengyu Ma
OOD
61
399
0
14 Nov 2016
1