Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.10077
Cited By
Are Transformers universal approximators of sequence-to-sequence functions?
20 December 2019
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Are Transformers universal approximators of sequence-to-sequence functions?"
50 / 246 papers shown
Title
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion
Dylan Zhang
Curt Tigges
Zory Zhang
Stella Biderman
Maxim Raginsky
Talia Ringer
24
12
0
23 Jan 2024
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Saurav Pawar
S.M. Towhidul Islam Tonmoy
S. M. M. Zaman
Vinija Jain
Aman Chadha
Amitava Das
37
28
0
15 Jan 2024
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDL
AI4CE
48
36
0
17 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
29
21
0
03 Dec 2023
Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers
Haowen Pan
Yixin Cao
Xiaozhi Wang
Xun Yang
Meng Wang
KELM
44
25
0
13 Nov 2023
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
20
49
0
01 Nov 2023
Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding
Jiangyan Ma
Yifei Wang
Yisen Wang
36
13
0
28 Oct 2023
Positional Encoding-based Resident Identification in Multi-resident Smart Homes
Zhiyi Song
Dipankar Chaki
Abdallah Lakhdari
A. Bouguettaya
21
2
0
27 Oct 2023
The Expressive Power of Low-Rank Adaptation
Yuchen Zeng
Kangwook Lee
41
51
0
26 Oct 2023
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
36
11
0
24 Oct 2023
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
52
34
0
19 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin
Yu Bai
Song Mei
OffRL
37
45
0
12 Oct 2023
Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure
Haotong Yang
Fanxu Meng
Zhouchen Lin
Muhan Zhang
LRM
31
2
0
09 Oct 2023
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
S. Bhattamishra
Arkil Patel
Phil Blunsom
Varun Kanade
27
45
0
04 Oct 2023
Spherical Position Encoding for Transformers
Eren Unlu
13
0
0
04 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon S. Du
40
36
0
01 Oct 2023
Auto-Regressive Next-Token Predictors are Universal Learners
Eran Malach
LRM
24
36
0
13 Sep 2023
Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis
Li Du
Yequan Wang
Xingrun Xing
Yiqun Ya
Xiang Li
Xin Jiang
Xuezhi Fang
HILM
33
13
0
11 Sep 2023
Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning
Sungjun Cho
Seunghyuk Cho
Sungwoo Park
Hankook Lee
Ho Hin Lee
Moontae Lee
39
6
0
08 Sep 2023
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Marcel Hirt
Domenico Campolo
Victoria Leong
Juan-Pablo Ortega
DRL
15
0
0
01 Sep 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
31
16
0
26 Jul 2023
Word Sense Disambiguation as a Game of Neurosymbolic Darts
Tiansi Dong
R. Sifa
23
2
0
25 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
40
22
0
21 Jul 2023
Transformers are Universal Predictors
Sourya Basu
Moulik Choraria
L. Varshney
28
4
0
15 Jul 2023
Teaching Arithmetic to Small Transformers
Nayoung Lee
Kartik K. Sreenivasan
Jason D. Lee
Kangwook Lee
Dimitris Papailiopoulos
LRM
32
82
0
07 Jul 2023
Sumformer: Universal Approximation for Efficient Transformers
Silas Alberti
Niclas Dern
L. Thesing
Gitta Kutyniok
27
16
0
05 Jul 2023
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
40
40
0
23 Jun 2023
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
34
182
0
16 Jun 2023
On the Role of Attention in Prompt-tuning
Samet Oymak
A. S. Rawat
Mahdi Soltanolkotabi
Christos Thrampoulidis
MLT
LRM
25
41
0
06 Jun 2023
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
22
81
0
05 Jun 2023
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance
Jinwoo Kim
Tien Dat Nguyen
Ayhan Suleymanzade
Hyeokjun An
Seunghoon Hong
50
23
0
05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Dinesh Manocha
29
23
0
04 Jun 2023
Memorization Capacity of Multi-Head Attention in Transformers
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
26
23
0
03 Jun 2023
CrystalGPT: Enhancing system-to-system transferability in crystallization prediction and control using time-series-transformers
Niranjan Sitapure
J. Kwon
26
51
0
31 May 2023
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization
Yufeng Zhang
Fengzhuo Zhang
Zhuoran Yang
Zhaoran Wang
BDL
36
65
0
30 May 2023
Smooth, exact rotational symmetrization for deep learning on point clouds
Sergey Pozdnyakov
Michele Ceriotti
3DPC
42
26
0
30 May 2023
Dissecting Chain-of-Thought: Compositionality through In-Context Filtering and Learning
Yingcong Li
Kartik K. Sreenivasan
Angeliki Giannou
Dimitris Papailiopoulos
Samet Oymak
LRM
18
16
0
30 May 2023
Universality and Limitations of Prompt Tuning
Yihan Wang
Jatin Chauhan
Wei Wang
Cho-Jui Hsieh
57
17
0
30 May 2023
When Does Optimizing a Proper Loss Yield Calibration?
Jarosław Błasiok
Parikshit Gopalan
Lunjia Hu
Preetum Nakkiran
39
24
0
30 May 2023
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
Shokichi Takakura
Taiji Suzuki
20
17
0
30 May 2023
How Powerful are Decoder-Only Transformer Neural Models?
Jesse Roberts
BDL
20
16
0
26 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
36
72
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
47
224
0
24 May 2023
On Structural Expressive Power of Graph Transformers
Wenhao Zhu
Tianyu Wen
Guojie Song
Liangji Wang
Bo Zheng
27
15
0
23 May 2023
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
30
46
0
09 May 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao Song
Yu Xia
Tong Yu
Dinesh Manocha
41
37
0
26 Apr 2023
A Latent Space Theory for Emergent Abilities in Large Language Models
Hui Jiang
LRM
25
35
0
19 Apr 2023
Pretrained Language Models as Visual Planners for Human Assistance
Dhruvesh Patel
H. Eghbalzadeh
Nitin Kamra
Michael L. Iuzzolino
Unnat Jain
Ruta Desai
LM&Ro
19
24
0
17 Apr 2023
Understanding the Role of the Projector in Knowledge Distillation
Roy Miles
K. Mikolajczyk
27
21
0
20 Mar 2023
Do Transformers Parse while Predicting the Masked Word?
Haoyu Zhao
A. Panigrahi
Rong Ge
Sanjeev Arora
76
31
0
14 Mar 2023
Previous
1
2
3
4
5
Next