On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

24 May 2018

Papers citing "On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport"

50 / 483 papers shown

Title
Central Limit Theorem for Bayesian Neural Network trained with Variational Inference Arnaud Descours Tom Huix Arnaud Guillin Manon Michel Eric Moulines Boris Nectoux 33 0 0 10 Jun 2024
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning D. Kunin Allan Raventós Clémentine Dominé Feng Chen David Klindt Andrew M. Saxe Surya Ganguli MLT 48 15 0 10 Jun 2024
Error Bounds of Supervised Classification from Information-Theoretic Perspective Binchuan Qi Wei Gong Li Li 34 0 0 07 Jun 2024
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs Luca Arnaboldi Yatin Dandi Florent Krzakala Bruno Loureiro Luca Pesce Ludovic Stephan 47 1 0 04 Jun 2024
Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks Stefano Sarao Mannelli Yaraslau Ivashinka Andrew M. Saxe Luca Saglietti 42 2 0 03 Jun 2024
Wasserstein gradient flow for optimal probability measure decomposition Jiangze Han Chris Ryan Xin T. Tong 23 1 0 03 Jun 2024
Symmetries in Overparametrized Neural Networks: A Mean-Field View Javier Maass Martínez Joaquin Fontbona FedML MLT 50 2 0 30 May 2024
Diffeomorphic interpolation for efficient persistence-based topological optimization Mathieu Carrière Marc Theveneau Théo Lacombe 21 1 0 29 May 2024
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes Zhenfeng Tu Santiago Aranguri Arthur Jacot 31 8 0 27 May 2024
Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data Nikita Tsoy Nikola Konstantinov 37 4 0 27 May 2024
Improved Particle Approximation Error for Mean Field Neural Networks Atsushi Nitanda 21 6 0 24 May 2024
Infinite Limits of Multi-head Transformer Dynamics Blake Bordelon Hamza Tahir Chaudhry C. Pehlevan AI4CE 47 9 0 24 May 2024
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions Luca Arnaboldi Yatin Dandi Florent Krzakala Luca Pesce Ludovic Stephan 70 12 0 24 May 2024
Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines Sho Sonoda Yuka Hashimoto Isao Ishikawa Masahiro Ikeda 39 0 0 22 May 2024
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing Zhongwang Zhang Pengxiao Lin Zhiwei Wang Yaoyu Zhang Z. Xu 39 3 0 08 May 2024
Convergence analysis of controlled particle systems arising in deep learning: from finite to infinite sample size Huafu Liao Alpár R. Mészáros Chenchen Mou Chao Zhou 26 2 0 08 Apr 2024
Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective Shokichi Takakura Taiji Suzuki MLT 22 5 0 22 Mar 2024
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport Raphael Barboni Gabriel Peyré Franccois-Xavier Vialard 37 3 0 19 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime Yihang Chen Fanghui Liu Yiping Lu Grigorios G. Chrysos V. Cevher 41 2 0 14 Mar 2024
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations Akshay Kumar Jarvis Haupt ODL 44 3 0 12 Mar 2024
Analysis of Kernel Mirror Prox for Measure Optimization Pavel Dvurechensky Jia Jie Zhu 31 2 0 29 Feb 2024
Learning Associative Memories with Gradient Descent Vivien A. Cabannes Berfin Simsek A. Bietti 38 6 0 28 Feb 2024
A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks Sho Sonoda Isao Ishikawa Masahiro Ikeda 49 4 0 25 Feb 2024
On the dynamics of three-layer neural networks: initial condensation Zheng-an Chen Tao Luo MLT AI4CE 22 3 0 25 Feb 2024
Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks Akshay Kumar Jarvis Haupt ODL 30 7 0 14 Feb 2024
Depth Separation in Norm-Bounded Infinite-Width Neural Networks Suzanna Parkinson Greg Ongie Rebecca Willett Ohad Shamir Nathan Srebro MDE 50 2 0 13 Feb 2024
Mirror Descent-Ascent for mean-field min-max problems Razvan-Andrei Lascu Mateusz B. Majka Lukasz Szpruch 27 1 0 12 Feb 2024
Sampling from the Mean-Field Stationary Distribution Yunbum Kook Matthew Shunshi Zhang Sinho Chewi Murat A. Erdogdu Mufan Bill Li 64 7 0 12 Feb 2024
Generalization Error of Graph Neural Networks in the Mean-field Regime Gholamali Aminian Yixuan He Gesine Reinert Lukasz Szpruch Samuel N. Cohen 48 3 0 10 Feb 2024
Asymptotics of feature learning in two-layer networks after one gradient-step Hugo Cui Luca Pesce Yatin Dandi Florent Krzakala Yue M. Lu Lenka Zdeborová Bruno Loureiro MLT 58 16 0 07 Feb 2024
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents Yatin Dandi Emanuele Troiani Luca Arnaboldi Luca Pesce Lenka Zdeborová Florent Krzakala MLT 66 26 0 05 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features Simone Bombari Marco Mondelli 39 3 0 05 Feb 2024
$C^*$ -Algebraic Machine Learning: Moving in a New Direction Yuka Hashimoto Masahiro Ikeda Hachem Kadri 35 2 0 04 Feb 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape Juno Kim Taiji Suzuki 18 18 0 02 Feb 2024
Privacy-preserving data release leveraging optimal transport and particle gradient descent Konstantin Donhauser Javier Abad Neha Hulkund Fanny Yang 41 4 0 31 Jan 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models Namjoon Suh Guang Cheng MedIm 30 12 0 14 Jan 2024
Hidden Minima in Two-Layer ReLU Networks Yossi Arjevani 32 3 0 28 Dec 2023
Mean-field underdamped Langevin dynamics and its spacetime discretization Qiang Fu Ashia Wilson 40 4 0 26 Dec 2023
A note on regularised NTK dynamics with an application to PAC-Bayesian training Eugenio Clerico Benjamin Guedj 33 0 0 20 Dec 2023
Enhancing Neural Training via a Correlated Dynamics Model Jonathan Brokman Roy Betser Rotem Turjeman Tom Berkov I. Cohen Guy Gilboa 24 3 0 20 Dec 2023
A mathematical perspective on Transformers Borjan Geshkovski Cyril Letrouit Yury Polyanskiy Philippe Rigollet EDL AI4CE 42 36 0 17 Dec 2023
FastPart: Over-Parameterized Stochastic Gradient Descent for Sparse optimisation on Measures Yohann De Castro S. Gadat C. Marteau 13 0 0 10 Dec 2023
Learning a Sparse Representation of Barron Functions with the Inverse Scale Space Flow T. J. Heeringa Tim Roith Christoph Brune Martin Burger 18 0 0 05 Dec 2023
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems Juno Kim Kakei Yamamoto Kazusato Oko Zhuoran Yang Taiji Suzuki 34 9 0 02 Dec 2023
The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks Lénaic Chizat Praneeth Netrapalli 20 4 0 30 Nov 2023
A convergence result of a continuous model of deep learning via Łojasiewicz--Simon inequality Noboru Isobe 16 2 0 26 Nov 2023
Eliminating Domain Bias for Federated Learning in Representation Space Jianqing Zhang Yang Hua Jian Cao Hao Wang Tao Song Zhengui Xue Ruhui Ma Haibing Guan FedML 73 33 0 25 Nov 2023
Minimum norm interpolation by perceptra: Explicit regularization and implicit bias Jiyoung Park Ian Pelakh Stephan Wojtowytsch 45 2 0 10 Nov 2023
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions Simon Martin Francis Bach Giulio Biroli 23 9 0 07 Nov 2023
Minimizing Convex Functionals over Space of Probability Measures via KL Divergence Gradient Flow Rentian Yao Linjun Huang Yun Yang 21 3 0 01 Nov 2023