Emergence of meta-stable clustering in mean-field transformer models

30 October 2024

Papers citing "Emergence of meta-stable clustering in mean-field transformer models"

18 / 18 papers shown

Title
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians Akiyoshi Tomihari Ryo Karakida 19 0 0 26 May 2025
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency Kelvin Kan Xingjian Li Benjamin J. Zhang Tuhin Sahai Stanley Osher Markos A. Katsoulakis 18 0 0 16 May 2025
Quantitative Clustering in Mean-Field Transformer Models Shi Chen Zhengjiang Lin Yury Polyanskiy Philippe Rigollet 70 2 0 20 Apr 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization Kelvin Kan Xingjian Li Stanley Osher 131 2 0 30 Jan 2025
Clustering in Causal Attention Masking Nikita Karagodin Yury Polyanskiy Philippe Rigollet 91 7 0 07 Nov 2024
Dynamic metastability in the self-attention model Borjan Geshkovski Hugo Koubbi Yury Polyanskiy Philippe Rigollet 31 8 0 09 Oct 2024
Synchronization on circles and spheres with nonlinear interactions Christopher Criscitiello Quentin Rebjock Andrew D. McRae Nicolas Boumal 60 4 0 28 May 2024
Geometric Dynamics of Signal Propagation Predict Trainability of Transformers Aditya Cowsik Tamra M. Nebabu Xiao-Liang Qi Surya Ganguli 48 11 0 05 Mar 2024
A mathematical perspective on Transformers Borjan Geshkovski Cyril Letrouit Yury Polyanskiy Philippe Rigollet EDL AI4CE 67 39 0 17 Dec 2023
The emergence of clusters in self-attention dynamics Borjan Geshkovski Cyril Letrouit Yury Polyanskiy Philippe Rigollet 51 49 0 09 May 2023
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime Andrea Agazzi Jianfeng Lu 34 16 0 22 Oct 2020
Quantitative Propagation of Chaos for SGD in Wide Neural Networks Valentin De Bortoli Alain Durmus Xavier Fontaine Umut Simsekli 55 25 0 13 Jul 2020
Neural Ordinary Differential Equations T. Chen Yulia Rubanova J. Bettencourt David Duvenaud AI4CE 253 5,024 0 19 Jun 2018
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport Lénaïc Chizat Francis R. Bach OT 166 731 0 24 May 2018
Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach Grant M. Rotskoff Eric Vanden-Eijnden 89 119 0 02 May 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks Song Mei Andrea Montanari Phan-Minh Nguyen MLT 76 855 0 18 Apr 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 506 129,831 0 12 Jun 2017
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 410 27,205 0 01 Sep 2014