Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)

28 March 2025

Papers citing "Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)"

6 / 6 papers shown

Title
Strassen Attention: Unlocking Compositional Abilities in Transformers Based on a New Lower Bound Method Alexander Kozachinskiy Felipe Urrutia Hector Jimenez Tomasz Steifer Germán Pizarro Matías Fuentes Francisco Meza Cristian Buc Cristóbal Rojas 155 2 0 31 Jan 2025
Simulating Hard Attention Using Soft Attention Andy Yang Lena Strobl David Chiang Dana Angluin 93 3 0 13 Dec 2024
One-layer transformers fail to solve the induction heads task Clayton Sanford Daniel J. Hsu Matus Telgarsky 91 12 0 26 Aug 2024
On Limitations of the Transformer Architecture Binghui Peng Srini Narayanan Christos H. Papadimitriou 77 37 0 13 Feb 2024
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? T. Kajitsuka Issei Sato 110 18 0 26 Jul 2023
Are Transformers universal approximators of sequence-to-sequence functions? Chulhee Yun Srinadh Bhojanapalli A. S. Rawat Sashank J. Reddi Sanjiv Kumar 124 359 0 20 Dec 2019