Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.19442
Cited By
v1
v2 (latest)
Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality
29 February 2024
Siyu Chen
Heejune Sheen
Tianhao Wang
Zhuoran Yang
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality"
12 / 12 papers shown
Title
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality
Ruhan Wang
Zhiyong Wang
Chengkai Huang
Rui Wang
Tong Yu
Lina Yao
John C. S. Lui
Dongruo Zhou
21
0
0
09 Jun 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
126
2
0
24 Feb 2025
Transformers versus the EM Algorithm in Multi-class Clustering
Yihan He
Hong-Yu Chen
Yuan Cao
Jianqing Fan
Han Liu
103
2
0
09 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
142
5
0
27 Jan 2025
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
144
0
0
17 Oct 2024
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Wei Shen
Ruida Zhou
Jing Yang
Cong Shen
81
4
0
15 Oct 2024
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Binghui Li
Yuanzhi Li
OOD
96
4
0
11 Oct 2024
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
101
3
0
07 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
185
7
0
07 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Pin-Yu Chen
Xiaodong Cui
Meng Wang
LRM
100
6
0
03 Oct 2024
Spin glass model of in-context learning
Yuhao Li
Ruoran Bai
Haiping Huang
LRM
143
0
0
05 Aug 2024
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Vishnu Teja Kunde
Vicram Rajagopalan
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
S. Shakkottai
D. Kalathil
J. Chamberland
144
6
0
01 Nov 2023
1