v1v2v3v4 (latest)

Approximation Rate of the Transformer Architecture for Sequence Modeling

3 January 2025

Papers citing "Approximation Rate of the Transformer Architecture for Sequence Modeling"

30 / 30 papers shown

Title
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions Haotian Jiang Zeyu Bao Shida Wang Qianxiao Li 55 0 0 06 Jun 2025
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective Yuling Jiao Yanming Lai Yang Wang Bokai Yan 62 0 0 18 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression Yuling Jiao Yanming Lai Defeng Sun Yang Wang Bokai Yan 143 0 0 16 Apr 2025
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding Kevin Xu Issei Sato 126 4 0 02 Oct 2024
Anchor function: a type of benchmark functions for studying language models Zhongwang Zhang Zhiwei Wang Junjie Yao Zhangchen Zhou Xiaolong Li E. Weinan Z. Xu 118 7 0 16 Jan 2024
A mathematical perspective on Transformers Borjan Geshkovski Cyril Letrouit Yury Polyanskiy Philippe Rigollet EDL AI4CE 138 47 0 17 Dec 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? T. Kajitsuka Issei Sato 129 18 0 26 Jul 2023
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection Yu Bai Fan Chen Haiquan Wang Caiming Xiong Song Mei 54 198 0 07 Jun 2023
Representational Strengths and Limitations of Transformers Clayton Sanford Daniel J. Hsu Matus Telgarsky 76 93 0 05 Jun 2023
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input Shokichi Takakura Taiji Suzuki 88 20 0 30 May 2023
Looped Transformers as Programmable Computers Angeliki Giannou Shashank Rajput Jy-yong Sohn Kangwook Lee Jason D. Lee Dimitris Papailiopoulos 103 107 0 30 Jan 2023
Are Transformers Effective for Time Series Forecasting? Ailing Zeng Mu-Hwa Chen L. Zhang Qiang Xu AI4TS 184 1,839 0 26 May 2022
Your Transformer May Not be as Powerful as You Expect Shengjie Luo Shanda Li Shuxin Zheng Tie-Yan Liu Liwei Wang Di He 139 54 0 26 May 2022
On the rate of convergence of a classifier based on a Transformer encoder Iryna Gurevych Michael Kohler Gözde Gül Sahin 64 15 0 29 Nov 2021
Can Vision Transformers Perform Convolution? Shanda Li Xiangning Chen Di He Cho-Jui Hsieh ViT 110 21 0 02 Nov 2021
Inductive Biases and Variable Creation in Self-Attention Mechanisms Benjamin L. Edelman Surbhi Goel Sham Kakade Cyril Zhang 106 125 0 19 Oct 2021
Universal Approximation Under Constraints is Possible with Transformers Anastasis Kratsios Behnoosh Zamanlooy Tianlin Liu Ivan Dokmanić 139 28 0 07 Oct 2021
Approximation Theory of Convolutional Architectures for Time Series Modelling Haotian Jiang Zhong Li Qianxiao Li AI4TS 83 12 0 20 Jul 2021
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting Haixu Wu Jiehui Xu Jianmin Wang Mingsheng Long AI4TS 128 2,371 0 24 Jun 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth Yihe Dong Jean-Baptiste Cordonnier Andreas Loukas 168 388 0 05 Mar 2021
Optimal Approximation Rate of ReLU Networks in terms of Width and Depth Zuowei Shen Haizhao Yang Shijun Zhang 211 120 0 28 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 772 41,884 0 22 Oct 2020
The Depth-to-Width Interplay in Self-Attention Yoav Levine Noam Wies Or Sharir Hofit Bata Amnon Shashua 137 46 0 22 Jun 2020
$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers Chulhee Yun Yin-Wen Chang Srinadh Bhojanapalli A. S. Rawat Sashank J. Reddi Sanjiv Kumar 65 84 0 08 Jun 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 1.2K 42,714 0 28 May 2020
Low-Rank Bottleneck in Multi-head Attention Models Srinadh Bhojanapalli Chulhee Yun A. S. Rawat Sashank J. Reddi Sanjiv Kumar 76 97 0 17 Feb 2020
Are Transformers universal approximators of sequence-to-sequence functions? Chulhee Yun Srinadh Bhojanapalli A. S. Rawat Sashank J. Reddi Sanjiv Kumar 140 360 0 20 Dec 2019
On the Relationship between Self-Attention and Convolutional Layers Jean-Baptiste Cordonnier Andreas Loukas Martin Jaggi 165 535 0 08 Nov 2019
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 980 133,443 0 12 Jun 2017
$Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with $ \ell^1 $ and $ \ell^0 $ Controls$ Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with $\ell^1$ and $\ell^0$ Controls Jason M. Klusowski Andrew R. Barron 292 143 0 26 Jul 2016