ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18475
  4. Cited By
Approximation Rate of the Transformer Architecture for Sequence Modeling
v1v2v3v4 (latest)

Approximation Rate of the Transformer Architecture for Sequence Modeling

3 January 2025
Hao Jiang
Qianxiao Li
ArXiv (abs)PDFHTML

Papers citing "Approximation Rate of the Transformer Architecture for Sequence Modeling"

30 / 30 papers shown
Title
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions
Haotian Jiang
Zeyu Bao
Shida Wang
Qianxiao Li
55
0
0
06 Jun 2025
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
62
0
0
18 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
143
0
0
16 Apr 2025
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
126
4
0
02 Oct 2024
Anchor function: a type of benchmark functions for studying language
  models
Anchor function: a type of benchmark functions for studying language models
Zhongwang Zhang
Zhiwei Wang
Junjie Yao
Zhangchen Zhou
Xiaolong Li
E. Weinan
Z. Xu
118
7
0
16 Jan 2024
A mathematical perspective on Transformers
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDLAI4CE
138
47
0
17 Dec 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
129
18
0
26 Jul 2023
Transformers as Statisticians: Provable In-Context Learning with
  In-Context Algorithm Selection
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Yu Bai
Fan Chen
Haiquan Wang
Caiming Xiong
Song Mei
54
198
0
07 Jun 2023
Representational Strengths and Limitations of Transformers
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
76
93
0
05 Jun 2023
Approximation and Estimation Ability of Transformers for
  Sequence-to-Sequence Functions with Infinite Dimensional Input
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
Shokichi Takakura
Taiji Suzuki
88
20
0
30 May 2023
Looped Transformers as Programmable Computers
Looped Transformers as Programmable Computers
Angeliki Giannou
Shashank Rajput
Jy-yong Sohn
Kangwook Lee
Jason D. Lee
Dimitris Papailiopoulos
103
107
0
30 Jan 2023
Are Transformers Effective for Time Series Forecasting?
Are Transformers Effective for Time Series Forecasting?
Ailing Zeng
Mu-Hwa Chen
L. Zhang
Qiang Xu
AI4TS
184
1,839
0
26 May 2022
Your Transformer May Not be as Powerful as You Expect
Your Transformer May Not be as Powerful as You Expect
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
139
54
0
26 May 2022
On the rate of convergence of a classifier based on a Transformer
  encoder
On the rate of convergence of a classifier based on a Transformer encoder
Iryna Gurevych
Michael Kohler
Gözde Gül Sahin
64
15
0
29 Nov 2021
Can Vision Transformers Perform Convolution?
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
110
21
0
02 Nov 2021
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
106
125
0
19 Oct 2021
Universal Approximation Under Constraints is Possible with Transformers
Universal Approximation Under Constraints is Possible with Transformers
Anastasis Kratsios
Behnoosh Zamanlooy
Tianlin Liu
Ivan Dokmanić
139
28
0
07 Oct 2021
Approximation Theory of Convolutional Architectures for Time Series
  Modelling
Approximation Theory of Convolutional Architectures for Time Series Modelling
Haotian Jiang
Zhong Li
Qianxiao Li
AI4TS
83
12
0
20 Jul 2021
Autoformer: Decomposition Transformers with Auto-Correlation for
  Long-Term Series Forecasting
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
Haixu Wu
Jiehui Xu
Jianmin Wang
Mingsheng Long
AI4TS
124
2,371
0
24 Jun 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
166
388
0
05 Mar 2021
Optimal Approximation Rate of ReLU Networks in terms of Width and Depth
Optimal Approximation Rate of ReLU Networks in terms of Width and Depth
Zuowei Shen
Haizhao Yang
Shijun Zhang
211
120
0
28 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
770
41,877
0
22 Oct 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
137
46
0
22 Jun 2020
$O(n)$ Connections are Expressive Enough: Universal Approximability of
  Sparse Transformers
O(n)O(n)O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
65
84
0
08 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.2K
42,712
0
28 May 2020
Low-Rank Bottleneck in Multi-head Attention Models
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
76
96
0
17 Feb 2020
Are Transformers universal approximators of sequence-to-sequence
  functions?
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
140
360
0
20 Dec 2019
On the Relationship between Self-Attention and Convolutional Layers
On the Relationship between Self-Attention and Convolutional Layers
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
159
535
0
08 Nov 2019
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
980
133,429
0
12 Jun 2017
Approximation by Combinations of ReLU and Squared ReLU Ridge Functions
  with $ \ell^1 $ and $ \ell^0 $ Controls
Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with ℓ1 \ell^1 ℓ1 and ℓ0 \ell^0 ℓ0 Controls
Jason M. Klusowski
Andrew R. Barron
292
143
0
26 Jul 2016
1