Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1805.09545
Cited By
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
24 May 2018
Lénaïc Chizat
Francis R. Bach
OT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport"
50 / 483 papers shown
Title
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Damien Ferbach
Baptiste Goujaud
Gauthier Gidel
Aymeric Dieuleveut
MoMe
21
16
0
29 Oct 2023
When can transformers reason with abstract symbols?
Enric Boix-Adserà
Omid Saremi
Emmanuel Abbe
Samy Bengio
Etai Littwin
Josh Susskind
LRM
NAI
31
12
0
15 Oct 2023
Accelerating optimization over the space of probability measures
Shi Chen
Wenxuan Wu
Yuhang Yao
Stephen J. Wright
29
4
0
06 Oct 2023
Sampling via Gradient Flows in the Space of Probability Measures
Yifan Chen
Daniel Zhengyu Huang
Jiaoyang Huang
Sebastian Reich
Andrew M. Stuart
30
13
0
05 Oct 2023
Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks
Sho Sonoda
Hideyuki Ishi
Isao Ishikawa
Masahiro Ikeda
16
4
0
05 Oct 2023
Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks
Sho Sonoda
Yuka Hashimoto
Isao Ishikawa
Masahiro Ikeda
27
3
0
05 Oct 2023
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
Greg Yang
Dingli Yu
Chen Zhu
Soufiane Hayou
MLT
8
27
0
03 Oct 2023
Spectral Neural Networks: Approximation Theory and Optimization Landscape
Chenghui Li
Rishi Sonthalia
Nicolas García Trillos
27
1
0
01 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu (Allen) Zhang
Beidi Chen
Simon S. Du
34
35
0
01 Oct 2023
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon
Lorenzo Noci
Mufan Bill Li
Boris Hanin
C. Pehlevan
32
23
0
28 Sep 2023
Beyond Log-Concavity: Theory and Algorithm for Sum-Log-Concave Optimization
Mastane Achab
33
1
0
26 Sep 2023
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Pulkit Gopalani
Samyak Jha
Anirbit Mukherjee
19
2
0
17 Sep 2023
Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini
Denny Wu
Taiji Suzuki
Murat A. Erdogdu
MLT
37
18
0
07 Sep 2023
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences
Samuel Chun-Hei Lam
Justin A. Sirignano
K. Spiliopoulos
30
2
0
28 Aug 2023
Six Lectures on Linearized Neural Networks
Theodor Misiakiewicz
Andrea Montanari
39
12
0
25 Aug 2023
Nonlinear Hamiltonian Monte Carlo & its Particle Approximation
Nawaf Bou-Rabee
Katharina Schuh
23
7
0
22 Aug 2023
Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks
R. Aiudi
R. Pacelli
A. Vezzani
R. Burioni
P. Rotondo
MLT
21
15
0
21 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
35
22
0
21 Jul 2023
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
Liam Collins
Hamed Hassani
Mahdi Soltanolkotabi
Aryan Mokhtari
Sanjay Shakkottai
39
10
0
13 Jul 2023
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
28
11
0
12 Jul 2023
Fundamental limits of overparametrized shallow neural networks for supervised learning
Francesco Camilli
D. Tieplova
Jean Barbier
35
9
0
11 Jul 2023
Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference
Arnaud Descours
Tom Huix
Arnaud Guillin
Manon Michel
Eric Moulines
Boris Nectoux
BDL
32
1
0
10 Jul 2023
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space
Zhengdao Chen
41
1
0
03 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Bill Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
33
29
0
30 Jun 2023
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions
Nishil Patel
Sebastian Lee
Stefano Sarao Mannelli
Sebastian Goldt
Adrew Saxe
OffRL
28
3
0
17 Jun 2023
Gradient is All You Need?
Konstantin Riedl
T. Klock
Carina Geldhauser
M. Fornasier
27
6
0
16 Jun 2023
Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD
Yijun Wan
Melih Barsbey
A. Zaidi
Umut Simsekli
30
1
0
13 Jun 2023
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction
Taiji Suzuki
Denny Wu
Atsushi Nitanda
32
16
0
12 Jun 2023
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
34
26
0
29 May 2023
A Rainbow in Deep Network Black Boxes
Florentin Guth
Brice Ménard
G. Rochette
S. Mallat
24
10
0
29 May 2023
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD
Luca Arnaboldi
Florent Krzakala
Bruno Loureiro
Ludovic Stephan
MLT
33
3
0
29 May 2023
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
Nikhil Vyas
Alexander B. Atanasov
Blake Bordelon
Depen Morwani
Sabarish Sainathan
C. Pehlevan
26
22
0
28 May 2023
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks
Puyu Wang
Yunwen Lei
Di Wang
Yiming Ying
Ding-Xuan Zhou
MLT
29
3
0
26 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
26
70
0
25 May 2023
Tight conditions for when the NTK approximation is valid
Enric Boix-Adserà
Etai Littwin
30
0
0
22 May 2023
Understanding the Initial Condensation of Convolutional Neural Networks
Zhangchen Zhou
Hanxu Zhou
Yuqing Li
Zhi-Qin John Xu
MLT
AI4CE
26
5
0
17 May 2023
Scalable Optimal Transport Methods in Machine Learning: A Contemporary Survey
Abdelwahed Khamis
Russell Tsuchida
Mohamed Tarek
V. Rolland
Lars Petersson
OT
45
12
0
08 May 2023
Expand-and-Cluster: Parameter Recovery of Neural Networks
Flavio Martinelli
Berfin Simsek
W. Gerstner
Johanni Brea
26
4
0
25 Apr 2023
Leveraging the two timescale regime to demonstrate convergence of neural networks
P. Marion
Raphael Berthier
36
5
0
19 Apr 2023
Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
Jing An
Jianfeng Lu
16
4
0
18 Apr 2023
Performative Prediction with Neural Networks
Mehrnaz Mofakhami
Ioannis Mitliagkas
Gauthier Gidel
40
16
0
14 Apr 2023
Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
Tejas Pagare
Vivek Borkar
Konstantin Avrachenkov
24
4
0
07 Apr 2023
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks
Blake Bordelon
C. Pehlevan
MLT
38
29
0
06 Apr 2023
Depth Separation with Multilayer Mean-Field Networks
Y. Ren
Mo Zhou
Rong Ge
OOD
19
3
0
03 Apr 2023
High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance
Krishnakumar Balasubramanian
Promit Ghosal
Ye He
38
5
0
03 Apr 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
François Ged
M. H. Veiga
28
0
0
22 Mar 2023
Global Optimality of Elman-type RNN in the Mean-Field Regime
Andrea Agazzi
Jian-Xiong Lu
Sayan Mukherjee
MLT
34
1
0
12 Mar 2023
Phase Diagram of Initial Condensation for Two-layer Neural Networks
Zheng Chen
Yuqing Li
Tao Luo
Zhaoguang Zhou
Z. Xu
MLT
AI4CE
49
8
0
12 Mar 2023
On the Implicit Bias of Linear Equivariant Steerable Networks
Ziyu Chen
Wei-wei Zhu
29
3
0
07 Mar 2023
Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems
Atsushi Nitanda
Kazusato Oko
Denny Wu
Nobuhito Takenouchi
Taiji Suzuki
32
3
0
06 Mar 2023
Previous
1
2
3
4
5
6
...
8
9
10
Next