ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.02054
  4. Cited By
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
v1v2 (latest)

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

4 October 2018
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
    MLTODL
ArXiv (abs)PDFHTML

Papers citing "Gradient Descent Provably Optimizes Over-parameterized Neural Networks"

50 / 882 papers shown
Title
Learning from higher-order statistics, efficiently: hypothesis tests,
  random features, and neural networks
Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks
Eszter Székely
Lorenzo Bardone
Federica Gerace
Sebastian Goldt
79
2
0
22 Dec 2023
Fed-CO2: Cooperation of Online and Offline Models for Severe Data
  Heterogeneity in Federated Learning
Fed-CO2: Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated Learning
Zhongyi Cai
Ye-ling Shi
Wei Huang
Jingya Wang
FedML
104
4
0
21 Dec 2023
Improving the Expressive Power of Deep Neural Networks through Integral
  Activation Transform
Improving the Expressive Power of Deep Neural Networks through Integral Activation Transform
Zezhong Zhang
Feng Bao
Guannan Zhang
61
0
0
19 Dec 2023
\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike
  \emph{sign} perceptrons neural networks
\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks
M. Stojnic
54
1
0
13 Dec 2023
Capacity of the treelike sign perceptrons neural networks with one
  hidden layer -- RDT based upper bounds
Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds
M. Stojnic
54
4
0
13 Dec 2023
Factor-Assisted Federated Learning for Personalized Optimization with
  Heterogeneous Data
Factor-Assisted Federated Learning for Personalized Optimization with Heterogeneous Data
Feifei Wang
Huiyun Tang
Yang Li
FedML
87
0
0
07 Dec 2023
Rethinking PGD Attack: Is Sign Function Necessary?
Rethinking PGD Attack: Is Sign Function Necessary?
Junjie Yang
Tianlong Chen
Xuxi Chen
Zhangyang Wang
Yingbin Liang
AAML
98
1
0
03 Dec 2023
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce
  Grokking
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
92
38
0
30 Nov 2023
The Feature Speed Formula: a flexible approach to scale hyper-parameters
  of deep neural networks
The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks
Lénaic Chizat
Praneeth Netrapalli
148
4
0
30 Nov 2023
FedEmb: A Vertical and Hybrid Federated Learning Algorithm using Network And Feature Embedding Aggregation
Fanfei Meng
Lele Zhang
Yu Chen
Yuxin Wang
FedML
91
4
0
30 Nov 2023
Weight fluctuations in (deep) linear neural networks and a derivation of
  the inverse-variance flatness relation
Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation
Markus Gross
A. Raulf
Christoph Räth
119
0
0
23 Nov 2023
Polynomially Over-Parameterized Convolutional Neural Networks Contain
  Structured Strong Winning Lottery Tickets
Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
A. D. Cunha
Francesco d’Amore
Emanuele Natale
MLT
66
1
0
16 Nov 2023
Minimum norm interpolation by perceptra: Explicit regularization and
  implicit bias
Minimum norm interpolation by perceptra: Explicit regularization and implicit bias
Jiyoung Park
Ian Pelakh
Stephan Wojtowytsch
82
1
0
10 Nov 2023
Efficient Compression of Overparameterized Deep Models through
  Low-Dimensional Learning Dynamics
Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics
Soo Min Kwon
Zekai Zhang
Dogyoon Song
Laura Balzano
Qing Qu
120
4
0
08 Nov 2023
On the Impact of Overparameterization on the Training of a Shallow
  Neural Network in High Dimensions
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions
Simon Martin
Francis Bach
Giulio Biroli
104
11
0
07 Nov 2023
On the Convergence of Encoder-only Shallow Transformers
On the Convergence of Encoder-only Shallow Transformers
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
Volkan Cevher
86
7
0
02 Nov 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
79
10
0
31 Oct 2023
Initialization Matters: Privacy-Utility Analysis of Overparameterized
  Neural Networks
Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks
Jiayuan Ye
Zhenyu Zhu
Fanghui Liu
Reza Shokri
Volkan Cevher
87
13
0
31 Oct 2023
Sample Complexity Bounds for Score-Matching: Causal Discovery and
  Generative Modeling
Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling
Zhenyu Zhu
Francesco Locatello
Volkan Cevher
81
7
0
27 Oct 2023
A qualitative difference between gradient flows of convex functions in
  finite- and infinite-dimensional Hilbert spaces
A qualitative difference between gradient flows of convex functions in finite- and infinite-dimensional Hilbert spaces
Jonathan W. Siegel
Stephan Wojtowytsch
66
3
0
26 Oct 2023
Optimization dependent generalization bound for ReLU networks based on
  sensitivity in the tangent bundle
Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle
Dániel Rácz
Mihaly Petreczky
András Csertán
Bálint Daróczy
MLT
63
1
0
26 Oct 2023
On the Convergence and Sample Complexity Analysis of Deep Q-Networks
  with $ε$-Greedy Exploration
On the Convergence and Sample Complexity Analysis of Deep Q-Networks with εεε-Greedy Exploration
Shuai Zhang
Hongkang Li
Meng Wang
Miao Liu
Pin-Yu Chen
Songtao Lu
Sijia Liu
K. Murugesan
Subhajit Chaudhury
111
22
0
24 Oct 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial
  Understanding
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Mehrdad Farajtabar
Sachin Mehta
Mohammad Rastegari
Oncel Tuzel
Hadi Pouransari
VLM
128
72
0
23 Oct 2023
Breaking through Deterministic Barriers: Randomized Pruning Mask
  Generation and Selection
Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection
Jianwei Li
Weizhi Gao
Qi Lei
Dongkuan Xu
66
2
0
19 Oct 2023
An Automatic Learning Rate Schedule Algorithm for Achieving Faster
  Convergence and Steeper Descent
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
Zhao Song
Chiwun Yang
90
10
0
17 Oct 2023
Neural Tangent Kernels Motivate Graph Neural Networks with
  Cross-Covariance Graphs
Neural Tangent Kernels Motivate Graph Neural Networks with Cross-Covariance Graphs
Shervin Khalafi
Saurabh Sihag
Alejandro Ribeiro
70
0
0
16 Oct 2023
Infinite Width Graph Neural Networks for Node Regression/ Classification
Infinite Width Graph Neural Networks for Node Regression/ Classification
Yunus Cobanoglu
AI4CE
67
1
0
12 Oct 2023
A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
Behrad Moniri
Donghwan Lee
Hamed Hassani
Yan Sun
MLT
102
23
0
11 Oct 2023
Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK
  Approach
Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach
Shaopeng Fu
Di Wang
AAML
127
2
0
09 Oct 2023
On the Convergence of Federated Averaging under Partial Participation
  for Over-parameterized Neural Networks
On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks
Xin Liu
Wei Tao
Dazhi Zhan
Yu Pan
Xin Ma
Yu Ding
Zhisong Pan
FedML
70
0
0
09 Oct 2023
How Graph Neural Networks Learn: Lessons from Training Dynamics
How Graph Neural Networks Learn: Lessons from Training Dynamics
Chenxiao Yang
Qitian Wu
David Wipf
Ruoyu Sun
Junchi Yan
AI4CEGNN
66
1
0
08 Oct 2023
Accelerated Neural Network Training with Rooted Logistic Objectives
Accelerated Neural Network Training with Rooted Logistic Objectives
Zhu Wang
Praveen Raj Veluswami
Harshit Mishra
Sathya Ravi
64
0
0
05 Oct 2023
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized
  Optimization
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
Youwang Kim
Lee Hyun
Kim Sung-Bin
Suekyeong Nam
Janghoon Ju
Tae-Hyun Oh
CVBM3DH
63
3
0
04 Oct 2023
Stochastic Thermodynamics of Learning Parametric Probabilistic Models
Stochastic Thermodynamics of Learning Parametric Probabilistic Models
S. Parsi
81
0
0
04 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing:
  The Curses of Symmetry and Initialization
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
Nuoya Xiong
Lijun Ding
Simon S. Du
120
13
0
03 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
82
41
0
01 Oct 2023
Universality of max-margin classifiers
Universality of max-margin classifiers
Andrea Montanari
Feng Ruan
Basil Saeed
Youngtak Sohn
71
5
0
29 Sep 2023
On the Disconnect Between Theory and Practice of Neural Networks: Limits
  of the NTK Perspective
On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective
Jonathan Wenger
Felix Dangel
Agustinus Kristiadi
97
0
0
29 Sep 2023
Sharp Generalization of Transductive Learning: A Transductive Local
  Rademacher Complexity Approach
Sharp Generalization of Transductive Learning: A Transductive Local Rademacher Complexity Approach
Yingzhen Yang
87
4
0
28 Sep 2023
On the Trade-offs between Adversarial Robustness and Actionable
  Explanations
On the Trade-offs between Adversarial Robustness and Actionable Explanations
Satyapriya Krishna
Chirag Agarwal
Himabindu Lakkaraju
AAML
84
0
0
28 Sep 2023
Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer
  ReLU Neural Networks
Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks
Yahong Yang
Qipin Chen
Wenrui Hao
66
4
0
26 Sep 2023
Convergence and Recovery Guarantees of Unsupervised Neural Networks for
  Inverse Problems
Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems
Nathan Buskulic
M. Fadili
Yvain Quéau
93
5
0
21 Sep 2023
Mixture Weight Estimation and Model Prediction in Multi-source
  Multi-target Domain Adaptation
Mixture Weight Estimation and Model Prediction in Multi-source Multi-target Domain Adaptation
Yuyang Deng
Ilja Kuzborskij
M. Mahdavi
OOD
65
2
0
19 Sep 2023
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Pulkit Gopalani
Samyak Jha
Anirbit Mukherjee
62
2
0
17 Sep 2023
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph
  Neural Network?
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?
Lianke Qin
Zhao Song
Baocheng Sun
100
7
0
14 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
114
29
0
14 Sep 2023
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
  Soft-Thresholding
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding
Shaik Basheeruddin Shah
Pradyumna Pradhan
Wei Pu
Ramunaidu Randhi
Miguel R. D. Rodrigues
Yonina C. Eldar
82
4
0
12 Sep 2023
Approximation Results for Gradient Descent trained Neural Networks
Approximation Results for Gradient Descent trained Neural Networks
G. Welper
68
1
0
09 Sep 2023
Optimal Rate of Kernel Regression in Large Dimensions
Optimal Rate of Kernel Regression in Large Dimensions
Weihao Lu
Hao Zhang
Yicheng Li
Manyun Xu
Qian Lin
93
6
0
08 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
91
8
0
07 Sep 2023
Previous
123456...161718
Next