v1v2v3v4 (latest)

Gradient Descent Finds Global Minima of Deep Neural Networks

9 November 2018

Papers citing "Gradient Descent Finds Global Minima of Deep Neural Networks"

50 / 466 papers shown

Title
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning François Caron Fadhel Ayed Paul Jung Hoileong Lee Juho Lee Hongseok Yang 129 2 0 02 Feb 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients Guihong Li Yuedong Yang Kartikeya Bhardwaj R. Marculescu 117 63 0 26 Jan 2023
Convergence beyond the over-parameterized regime using Rayleigh quotients David A. R. Robin Kevin Scaman Marc Lelarge 60 3 0 19 Jan 2023
Stretched and measured neural predictions of complex network dynamics V. Vasiliauskaite Nino Antulov-Fantulin 67 1 0 12 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models Yufeng Zhang Boyi Liu Qi Cai Lingxiao Wang Zhaoran Wang 121 13 0 30 Dec 2022
Enhancing Neural Network Differential Equation Solvers Matthew J. H. Wright 44 1 0 28 Dec 2022
Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization Daesung Kim Hye Won Chung 86 2 0 19 Dec 2022
Reconstructing Training Data from Model Gradient, Provably Zihan Wang Jason D. Lee Qi Lei FedML 116 26 0 07 Dec 2022
Zeroth-Order Alternating Gradient Descent Ascent Algorithms for a Class of Nonconvex-Nonconcave Minimax Problems Zi Xu Ziqi Wang Junlin Wang Y. Dai 103 11 0 24 Nov 2022
Mechanistic Mode Connectivity Ekdeep Singh Lubana Eric J. Bigelow Robert P. Dick David M. Krueger Hidenori Tanaka 118 49 0 15 Nov 2022
Spectral Evolution and Invariance in Linear-width Neural Networks Zhichao Wang A. Engel Anand D. Sarwate Ioana Dumitriu Tony Chiang 116 18 0 11 Nov 2022
Finite Sample Identification of Wide Shallow Neural Networks with Biases M. Fornasier T. Klock Marco Mondelli Michael Rauchensteiner 52 6 0 08 Nov 2022
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks Zhengdao Chen Eric Vanden-Eijnden Joan Bruna MLT 75 5 0 28 Oct 2022
Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout Chen Dun Mirian Hipolito Garcia C. Jermaine Dimitrios Dimitriadis Anastasios Kyrillidis 136 22 0 28 Oct 2022
Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery Tyler Maunu Thibaut Le Gouic Philippe Rigollet 64 5 0 26 Oct 2022
GCT: Gated Contextual Transformer for Sequential Audio Tagging Yuanbo Hou Yun Wang Wenwu Wang Dick Botteldooren 60 0 0 22 Oct 2022
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work Jiawei Zhang Yushun Zhang Mingyi Hong Ruoyu Sun Zhi-Quan Luo 124 10 0 21 Oct 2022
Few-shot Backdoor Attacks via Neural Tangent Kernels J. Hayase Sewoong Oh 72 21 0 12 Oct 2022
A Kernel-Based View of Language Model Fine-Tuning Sadhika Malladi Alexander Wettig Dingli Yu Danqi Chen Sanjeev Arora VLM 157 69 0 11 Oct 2022
What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? Nikolaos Tsilivis Julia Kempe AAML 98 20 0 11 Oct 2022
On skip connections and normalisation layers in deep optimisation L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 74 2 0 10 Oct 2022
Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network Siqi Liang Y. Sun F. Liang BDL 71 11 0 09 Oct 2022
Adaptive Smoothness-weighted Adversarial Training for Multiple Perturbations with Its Stability Analysis Jiancong Xiao Zeyu Qin Yanbo Fan Baoyuan Wu Jue Wang Zhimin Luo AAML 124 7 0 02 Oct 2022
Improved Algorithms for Neural Active Learning Yikun Ban Yuheng Zhang Hanghang Tong A. Banerjee Jingrui He AI4TS 61 12 0 02 Oct 2022
Restricted Strong Convexity of Deep Learning Models with Smooth Activations A. Banerjee Pedro Cisneros-Velarde Libin Zhu M. Belkin 73 8 0 29 Sep 2022
Magnitude and Angle Dynamics in Training Single ReLU Neurons Sangmin Lee Byeongsu Sim Jong Chul Ye MLT 137 6 0 27 Sep 2022
Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$ R. Gentile G. Welper ODL 102 7 0 17 Sep 2022
Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization) Zhenyu Zhu Fanghui Liu Grigorios G. Chrysos Volkan Cevher 104 21 0 15 Sep 2022
Generalization Properties of NAS under Activation and Skip Connection Search Zhenyu Zhu Fanghui Liu Grigorios G. Chrysos Volkan Cevher AI4CE 90 17 0 15 Sep 2022
Visualizing high-dimensional loss landscapes with Hessian directions Lucas Böttcher Gregory R. Wheeler 79 14 0 28 Aug 2022
A Sublinear Adversarial Training Algorithm Yeqi Gao Lianke Qin Zhao Song Yitan Wang GAN 77 25 0 10 Aug 2022
Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks Xin Liu Wei Tao Wei Li Dazhi Zhan Jun Wang Zhisong Pan ODL 76 1 0 08 Aug 2022
Feature selection with gradient descent on two-layer networks in low-rotation regimes Matus Telgarsky MLT 81 16 0 04 Aug 2022
Gradient descent provably escapes saddle points in the training of shallow ReLU networks Patrick Cheridito Arnulf Jentzen Florian Rossmannek 103 5 0 03 Aug 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability Z. Li Zixuan Wang Jian Li 97 47 0 26 Jul 2022
Can we achieve robustness from data alone? Nikolaos Tsilivis Jingtong Su Julia Kempe OOD DD 108 18 0 24 Jul 2022
Deep Sequence Models for Text Classification Tasks S. S. Abdullahi Su Yiming Shamsuddeen Hassan Muhammad A. Mustapha Ahmad Muhammad Aminu Abdulkadir Abdullahi Musa Bello Saminu Mohammad Aliyu 53 3 0 18 Jul 2022
Efficient Augmentation for Imbalanced Deep Learning Damien Dablain C. Bellinger Bartosz Krawczyk Nitesh Chawla 66 7 0 13 Jul 2022
Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm Lechao Xiao Jeffrey Pennington 101 10 0 11 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent Zhiyuan Li Tianhao Wang Jason D. Lee Sanjeev Arora 104 29 0 08 Jul 2022
Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity Jianyi Yang Shaolei Ren 78 3 0 02 Jul 2022
Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis Alexander Munteanu Simon Omlor Zhao Song David P. Woodruff 97 15 0 26 Jun 2022
Limitations of the NTK for Understanding Generalization in Deep Learning Nikhil Vyas Yamini Bansal Preetum Nakkiran 116 34 0 20 Jun 2022
On the fast convergence of minibatch heavy ball momentum Raghu Bollapragada Tyler Chen Rachel A. Ward 110 19 0 15 Jun 2022
From Perception to Programs: Regularize, Overparameterize, and Amortize Hao Tang Kevin Ellis NAI 82 10 0 13 Jun 2022
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen Trang H. Tran 63 2 0 13 Jun 2022
What is a Good Metric to Study Generalization of Minimax Learners? Asuman Ozdaglar S. Pattathil Jiawei Zhang Kai Zhang 66 14 0 09 Jun 2022
Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks Huishuai Zhang Da Yu Yiping Lu Di He AAML 98 1 0 09 Jun 2022
Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime Benjamin Bowman Guido Montúfar 82 15 0 06 Jun 2022
The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization Mufan Li Mihai Nica Daniel M. Roy 104 39 0 06 Jun 2022