Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18475
Cited By
v1
v2
v3
v4 (latest)
Approximation Rate of the Transformer Architecture for Sequence Modeling
3 January 2025
Hao Jiang
Qianxiao Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Approximation Rate of the Transformer Architecture for Sequence Modeling"
30 / 30 papers shown
Title
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions
Haotian Jiang
Zeyu Bao
Shida Wang
Qianxiao Li
55
0
0
06 Jun 2025
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
62
0
0
18 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
143
0
0
16 Apr 2025
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
126
4
0
02 Oct 2024
Anchor function: a type of benchmark functions for studying language models
Zhongwang Zhang
Zhiwei Wang
Junjie Yao
Zhangchen Zhou
Xiaolong Li
E. Weinan
Z. Xu
118
7
0
16 Jan 2024
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDL
AI4CE
138
47
0
17 Dec 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
129
18
0
26 Jul 2023
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Yu Bai
Fan Chen
Haiquan Wang
Caiming Xiong
Song Mei
54
198
0
07 Jun 2023
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
76
93
0
05 Jun 2023
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
Shokichi Takakura
Taiji Suzuki
88
20
0
30 May 2023
Looped Transformers as Programmable Computers
Angeliki Giannou
Shashank Rajput
Jy-yong Sohn
Kangwook Lee
Jason D. Lee
Dimitris Papailiopoulos
103
107
0
30 Jan 2023
Are Transformers Effective for Time Series Forecasting?
Ailing Zeng
Mu-Hwa Chen
L. Zhang
Qiang Xu
AI4TS
184
1,839
0
26 May 2022
Your Transformer May Not be as Powerful as You Expect
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
139
54
0
26 May 2022
On the rate of convergence of a classifier based on a Transformer encoder
Iryna Gurevych
Michael Kohler
Gözde Gül Sahin
64
15
0
29 Nov 2021
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
110
21
0
02 Nov 2021
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
106
125
0
19 Oct 2021
Universal Approximation Under Constraints is Possible with Transformers
Anastasis Kratsios
Behnoosh Zamanlooy
Tianlin Liu
Ivan Dokmanić
139
28
0
07 Oct 2021
Approximation Theory of Convolutional Architectures for Time Series Modelling
Haotian Jiang
Zhong Li
Qianxiao Li
AI4TS
83
12
0
20 Jul 2021
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
Haixu Wu
Jiehui Xu
Jianmin Wang
Mingsheng Long
AI4TS
124
2,371
0
24 Jun 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
166
388
0
05 Mar 2021
Optimal Approximation Rate of ReLU Networks in terms of Width and Depth
Zuowei Shen
Haizhao Yang
Shijun Zhang
211
120
0
28 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
770
41,877
0
22 Oct 2020
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
137
46
0
22 Jun 2020
O
(
n
)
O(n)
O
(
n
)
Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
65
84
0
08 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.2K
42,712
0
28 May 2020
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
76
96
0
17 Feb 2020
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
140
360
0
20 Dec 2019
On the Relationship between Self-Attention and Convolutional Layers
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
159
535
0
08 Nov 2019
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
980
133,429
0
12 Jun 2017
Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with
ℓ
1
\ell^1
ℓ
1
and
ℓ
0
\ell^0
ℓ
0
Controls
Jason M. Klusowski
Andrew R. Barron
292
143
0
26 Jul 2016
1