ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.04596
  4. Cited By
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

9 July 2020
Yuanzhi Li
Tengyu Ma
Hongyang R. Zhang
    MLT
ArXivPDFHTML

Papers citing "Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK"

50 / 56 papers shown
Title
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
101
94
0
15 Jun 2020
Feature Purification: How Adversarial Training Performs Robust Deep
  Learning
Feature Purification: How Adversarial Training Performs Robust Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
MLT
AAML
61
149
0
20 May 2020
Making Method of Moments Great Again? -- How can GANs learn
  distributions
Making Method of Moments Great Again? -- How can GANs learn distributions
Yuanzhi Li
Zehao Dou
GAN
22
5
0
09 Mar 2020
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
70
168
0
19 Dec 2019
Enhanced Convolutional Neural Tangent Kernels
Enhanced Convolutional Neural Tangent Kernels
Zhiyuan Li
Ruosong Wang
Dingli Yu
S. Du
Wei Hu
Ruslan Salakhutdinov
Sanjeev Arora
60
131
0
03 Nov 2019
Beyond Linearization: On Quadratic and Higher-Order Approximation of
  Wide Neural Networks
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks
Yu Bai
Jason D. Lee
45
116
0
03 Oct 2019
Finite Depth and Width Corrections to the Neural Tangent Kernel
Finite Depth and Width Corrections to the Neural Tangent Kernel
Boris Hanin
Mihai Nica
MDE
56
150
0
13 Sep 2019
Towards Explaining the Regularization Effect of Initial Large Learning
  Rate in Training Neural Networks
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
39
295
0
10 Jul 2019
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep
  Neural Networks
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Yuan Cao
Quanquan Gu
MLT
AI4CE
70
384
0
30 May 2019
What Can ResNet Learn Efficiently, Going Beyond Kernels?
What Can ResNet Learn Efficiently, Going Beyond Kernels?
Zeyuan Allen-Zhu
Yuanzhi Li
198
183
0
24 May 2019
Linearized two-layers neural networks in high dimension
Linearized two-layers neural networks in high dimension
Behrooz Ghorbani
Song Mei
Theodor Misiakiewicz
Andrea Montanari
MLT
45
243
0
27 Apr 2019
On Exact Computation with an Infinitely Wide Neural Net
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
163
915
0
26 Apr 2019
On the Power and Limitations of Random Features for Understanding Neural
  Networks
On the Power and Limitations of Random Features for Understanding Neural Networks
Gilad Yehudai
Ohad Shamir
MLT
53
182
0
01 Apr 2019
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian
  Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
Greg Yang
96
286
0
13 Feb 2019
Towards moderate overparameterization: global convergence guarantees for
  training shallow neural networks
Towards moderate overparameterization: global convergence guarantees for training shallow neural networks
Samet Oymak
Mahdi Soltanolkotabi
43
320
0
12 Feb 2019
Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Zeyuan Allen-Zhu
Yuanzhi Li
MLT
LRM
88
57
0
04 Feb 2019
Fine-Grained Analysis of Optimization and Generalization for
  Overparameterized Two-Layer Neural Networks
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
MLT
144
966
0
24 Jan 2019
On Lazy Training in Differentiable Programming
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
92
823
0
19 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
135
448
0
21 Nov 2018
Learning and Generalization in Overparameterized Neural Networks, Going
  Beyond Two Layers
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Zeyuan Allen-Zhu
Yuanzhi Li
Yingyu Liang
MLT
138
769
0
12 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CE
ODL
192
1,457
0
09 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient Descent Finds Global Minima of Deep Neural Networks
S. Du
Jason D. Lee
Haochuan Li
Liwei Wang
Masayoshi Tomizuka
ODL
145
1,133
0
09 Nov 2018
Learning Two Layer Rectified Neural Networks in Polynomial Time
Learning Two Layer Rectified Neural Networks in Polynomial Time
Ainesh Bakshi
Rajesh Jayaram
David P. Woodruff
NoLa
109
69
0
05 Nov 2018
On the Convergence Rate of Training Recurrent Neural Networks
On the Convergence Rate of Training Recurrent Neural Networks
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
121
191
0
29 Oct 2018
Learning Two-layer Neural Networks with Symmetric Inputs
Learning Two-layer Neural Networks with Symmetric Inputs
Rong Ge
Rohith Kuditipudi
Zhize Li
Xiang Wang
OOD
MLT
121
57
0
16 Oct 2018
Regularization Matters: Generalization and Optimization of Neural Nets
  v.s. their Induced Kernel
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Colin Wei
Jason D. Lee
Qiang Liu
Tengyu Ma
136
245
0
12 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
157
1,261
0
04 Oct 2018
Learning Overparameterized Neural Networks via Stochastic Gradient
  Descent on Structured Data
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Yuanzhi Li
Yingyu Liang
MLT
144
652
0
03 Aug 2018
Learning One-hidden-layer ReLU Networks via Gradient Descent
Learning One-hidden-layer ReLU Networks via Gradient Descent
Xiao Zhang
Yaodong Yu
Lingxiao Wang
Quanquan Gu
MLT
75
134
0
20 Jun 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
198
3,160
0
20 Jun 2018
On the Global Convergence of Gradient Descent for Over-parameterized
  Models using Optimal Transport
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Lénaïc Chizat
Francis R. Bach
OT
157
731
0
24 May 2018
Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial
  Convergence and SQ Lower Bounds
Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds
Santosh Vempala
John Wilmes
MLT
69
50
0
07 May 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks
A Mean Field View of the Landscape of Two-Layers Neural Networks
Song Mei
Andrea Montanari
Phan-Minh Nguyen
MLT
76
855
0
18 Apr 2018
An Alternative View: When Does SGD Escape Local Minima?
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
57
316
0
17 Feb 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
118
11,520
0
15 Feb 2018
Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of
  Spurious Local Minima
Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima
S. Du
Jason D. Lee
Yuandong Tian
Barnabás Póczós
Aarti Singh
MLT
128
236
0
03 Dec 2017
Learning One-hidden-layer Neural Networks with Landscape Design
Learning One-hidden-layer Neural Networks with Landscape Design
Rong Ge
Jason D. Lee
Tengyu Ma
MLT
135
260
0
01 Nov 2017
Theoretical properties of the global optimizer of two layer neural
  network
Theoretical properties of the global optimizer of two layer neural network
Digvijay Boob
Guanghui Lan
MLT
93
34
0
30 Oct 2017
First-order Methods Almost Always Avoid Saddle Points
First-order Methods Almost Always Avoid Saddle Points
Jason D. Lee
Ioannis Panageas
Georgios Piliouras
Max Simchowitz
Michael I. Jordan
Benjamin Recht
ODL
116
83
0
20 Oct 2017
Regularizing and Optimizing LSTM Language Models
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
148
1,093
0
07 Aug 2017
Theoretical insights into the optimization landscape of
  over-parameterized shallow neural networks
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
Mahdi Soltanolkotabi
Adel Javanmard
Jason D. Lee
118
417
0
16 Jul 2017
On the Optimization Landscape of Tensor Decompositions
On the Optimization Landscape of Tensor Decompositions
Rong Ge
Tengyu Ma
86
91
0
18 Jun 2017
Provable Alternating Gradient Descent for Non-negative Matrix
  Factorization with Strong Correlations
Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations
Yuanzhi Li
Yingyu Liang
88
17
0
13 Jun 2017
Recovery Guarantees for One-hidden-layer Neural Networks
Recovery Guarantees for One-hidden-layer Neural Networks
Kai Zhong
Zhao Song
Prateek Jain
Peter L. Bartlett
Inderjit S. Dhillon
MLT
120
336
0
10 Jun 2017
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
Yuanzhi Li
Yang Yuan
MLT
106
650
0
28 May 2017
Convolutional Sequence to Sequence Learning
Convolutional Sequence to Sequence Learning
Jonas Gehring
Michael Auli
David Grangier
Denis Yarats
Yann N. Dauphin
AIMat
122
3,279
0
08 May 2017
An Analytical Formula of Population Gradient for two-layered ReLU
  network and its Applications in Convergence and Critical Point Analysis
An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis
Yuandong Tian
MLT
136
216
0
02 Mar 2017
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
Alon Brutzkus
Amir Globerson
MLT
115
313
0
26 Feb 2017
Recovery Guarantee of Non-negative Matrix Factorization via Alternating
  Updates
Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates
Yuanzhi Li
Yingyu Liang
Andrej Risteski
108
30
0
11 Nov 2016
Diverse Neural Network Learns True Target Functions
Diverse Neural Network Learns True Target Functions
Bo Xie
Yingyu Liang
Le Song
127
137
0
09 Nov 2016
12
Next