Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.23228
Cited By
Emergence of meta-stable clustering in mean-field transformer models
30 October 2024
Giuseppe Bruno
Federico Pasqualotto
Andrea Agazzi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Emergence of meta-stable clustering in mean-field transformer models"
18 / 18 papers shown
Title
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
Akiyoshi Tomihari
Ryo Karakida
19
0
0
26 May 2025
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Kelvin Kan
Xingjian Li
Benjamin J. Zhang
Tuhin Sahai
Stanley Osher
Markos A. Katsoulakis
18
0
0
16 May 2025
Quantitative Clustering in Mean-Field Transformer Models
Shi Chen
Zhengjiang Lin
Yury Polyanskiy
Philippe Rigollet
70
2
0
20 Apr 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
Kelvin Kan
Xingjian Li
Stanley Osher
131
2
0
30 Jan 2025
Clustering in Causal Attention Masking
Nikita Karagodin
Yury Polyanskiy
Philippe Rigollet
91
7
0
07 Nov 2024
Dynamic metastability in the self-attention model
Borjan Geshkovski
Hugo Koubbi
Yury Polyanskiy
Philippe Rigollet
29
8
0
09 Oct 2024
Synchronization on circles and spheres with nonlinear interactions
Christopher Criscitiello
Quentin Rebjock
Andrew D. McRae
Nicolas Boumal
60
4
0
28 May 2024
Geometric Dynamics of Signal Propagation Predict Trainability of Transformers
Aditya Cowsik
Tamra M. Nebabu
Xiao-Liang Qi
Surya Ganguli
48
11
0
05 Mar 2024
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDL
AI4CE
65
39
0
17 Dec 2023
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
51
49
0
09 May 2023
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Andrea Agazzi
Jianfeng Lu
34
16
0
22 Oct 2020
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Valentin De Bortoli
Alain Durmus
Xavier Fontaine
Umut Simsekli
55
25
0
13 Jul 2020
Neural Ordinary Differential Equations
T. Chen
Yulia Rubanova
J. Bettencourt
David Duvenaud
AI4CE
251
5,024
0
19 Jun 2018
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Lénaïc Chizat
Francis R. Bach
OT
164
731
0
24 May 2018
Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach
Grant M. Rotskoff
Eric Vanden-Eijnden
89
119
0
02 May 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks
Song Mei
Andrea Montanari
Phan-Minh Nguyen
MLT
76
855
0
18 Apr 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
501
129,831
0
12 Jun 2017
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
406
27,205
0
01 Sep 2014
1