Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.17426
Cited By
Superiority of Multi-Head Attention in In-Context Linear Regression
30 January 2024
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Superiority of Multi-Head Attention in In-Context Linear Regression"
19 / 19 papers shown
Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
104
0
0
02 May 2025
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
102
0
0
17 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Pin-Yu Chen
Xiaodong Cui
Meng Wang
LRM
46
6
0
03 Oct 2024
Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Yichuan Deng
Zhao Song
Dinesh Manocha
61
14
0
18 Oct 2023
In-Context Learning through the Bayesian Prism
Madhuri Panwar
Kabir Ahuja
Navin Goyal
BDL
74
48
0
08 Jun 2023
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Yu Bai
Fan Chen
Haiquan Wang
Caiming Xiong
Song Mei
50
193
0
07 Jun 2023
Memorization Capacity of Multi-Head Attention in Transformers
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
88
24
0
03 Jun 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Ming Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
100
64
0
12 Feb 2023
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
110
489
0
15 Dec 2022
Active Example Selection for In-Context Learning
Yiming Zhang
Shi Feng
Chenhao Tan
SILM
LRM
94
202
0
08 Nov 2022
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
238
364
0
06 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
139
506
0
01 Aug 2022
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
86
84
0
24 Feb 2022
MetaICL: Learning to Learn In Context
Sewon Min
M. Lewis
Luke Zettlemoyer
Hannaneh Hajishirzi
LRM
212
489
0
29 Oct 2021
Noisy Channel Language Model Prompting for Few-Shot Text Classification
Sewon Min
Michael Lewis
Hannaneh Hajishirzi
Luke Zettlemoyer
VLM
79
220
0
09 Aug 2021
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
119
247
0
22 Feb 2021
What Makes Good In-Context Examples for GPT-
3
3
3
?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
385
1,379
0
17 Jan 2021
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
201
1,771
0
29 Jun 2020
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
68
271
0
16 Jun 2019
1