ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.23228
  4. Cited By
Emergence of meta-stable clustering in mean-field transformer models

Emergence of meta-stable clustering in mean-field transformer models

30 October 2024
Giuseppe Bruno
Federico Pasqualotto
Andrea Agazzi
ArXivPDFHTML

Papers citing "Emergence of meta-stable clustering in mean-field transformer models"

18 / 18 papers shown
Title
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
Akiyoshi Tomihari
Ryo Karakida
19
0
0
26 May 2025
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Kelvin Kan
Xingjian Li
Benjamin J. Zhang
Tuhin Sahai
Stanley Osher
Markos A. Katsoulakis
18
0
0
16 May 2025
Quantitative Clustering in Mean-Field Transformer Models
Quantitative Clustering in Mean-Field Transformer Models
Shi Chen
Zhengjiang Lin
Yury Polyanskiy
Philippe Rigollet
70
2
0
20 Apr 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
Kelvin Kan
Xingjian Li
Stanley Osher
131
2
0
30 Jan 2025
Clustering in Causal Attention Masking
Clustering in Causal Attention Masking
Nikita Karagodin
Yury Polyanskiy
Philippe Rigollet
91
7
0
07 Nov 2024
Dynamic metastability in the self-attention model
Dynamic metastability in the self-attention model
Borjan Geshkovski
Hugo Koubbi
Yury Polyanskiy
Philippe Rigollet
31
8
0
09 Oct 2024
Synchronization on circles and spheres with nonlinear interactions
Synchronization on circles and spheres with nonlinear interactions
Christopher Criscitiello
Quentin Rebjock
Andrew D. McRae
Nicolas Boumal
60
4
0
28 May 2024
Geometric Dynamics of Signal Propagation Predict Trainability of
  Transformers
Geometric Dynamics of Signal Propagation Predict Trainability of Transformers
Aditya Cowsik
Tamra M. Nebabu
Xiao-Liang Qi
Surya Ganguli
48
11
0
05 Mar 2024
A mathematical perspective on Transformers
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDL
AI4CE
67
39
0
17 Dec 2023
The emergence of clusters in self-attention dynamics
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
51
49
0
09 May 2023
Global optimality of softmax policy gradient with single hidden layer
  neural networks in the mean-field regime
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Andrea Agazzi
Jianfeng Lu
34
16
0
22 Oct 2020
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Valentin De Bortoli
Alain Durmus
Xavier Fontaine
Umut Simsekli
55
25
0
13 Jul 2020
Neural Ordinary Differential Equations
Neural Ordinary Differential Equations
T. Chen
Yulia Rubanova
J. Bettencourt
David Duvenaud
AI4CE
253
5,024
0
19 Jun 2018
On the Global Convergence of Gradient Descent for Over-parameterized
  Models using Optimal Transport
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Lénaïc Chizat
Francis R. Bach
OT
166
731
0
24 May 2018
Trainability and Accuracy of Neural Networks: An Interacting Particle
  System Approach
Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach
Grant M. Rotskoff
Eric Vanden-Eijnden
89
119
0
02 May 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks
A Mean Field View of the Landscape of Two-Layers Neural Networks
Song Mei
Andrea Montanari
Phan-Minh Nguyen
MLT
76
855
0
18 Apr 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
506
129,831
0
12 Jun 2017
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
410
27,205
0
01 Sep 2014
1