ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.05369
  4. Cited By
Regularization Matters: Generalization and Optimization of Neural Nets
  v.s. their Induced Kernel
v1v2v3v4 (latest)

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

12 October 2018
Colin Wei
Jason D. Lee
Qiang Liu
Tengyu Ma
ArXiv (abs)PDFHTML

Papers citing "Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel"

50 / 192 papers shown
Title
Embedding principle of homogeneous neural network for classification problem
Embedding principle of homogeneous neural network for classification problem
Jiahan Zhang
Yaoyu Zhang
Yaoyu Zhang
86
0
0
18 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
446
0
0
01 May 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
107
1
0
14 Mar 2025
Learning richness modulates equality reasoning in neural networks
Learning richness modulates equality reasoning in neural networks
William L. Tong
Cengiz Pehlevan
66
0
0
12 Mar 2025
Low-rank bias, weight decay, and model merging in neural networks
Ilja Kuzborskij
Yasin Abbasi-Yadkori
88
0
0
24 Feb 2025
Robust Feature Learning for Multi-Index Models in High Dimensions
Robust Feature Learning for Multi-Index Models in High Dimensions
Alireza Mousavi-Hosseini
Adel Javanmard
Murat A. Erdogdu
OODAAML
177
1
0
21 Oct 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient
  Methods
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri
Christos Thrampoulidis
Arya Mazumdar
MLT
123
0
0
13 Oct 2024
Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility
Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility
Rajdeep Haldar
Yue Xing
Qifan Song
Guang Lin
56
0
0
09 Oct 2024
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
Jipeng Han
110
0
0
02 Oct 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Alireza Mousavi-Hosseini
Denny Wu
Murat A. Erdogdu
MLTAI4CE
101
8
0
14 Aug 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Mohamad Amin Mohamadi
Zhiyuan Li
Lei Wu
Danica J. Sutherland
112
11
0
17 Jul 2024
Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data
Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data
Nikita Tsoy
Nikola Konstantinov
80
4
0
27 May 2024
Learning with Norm Constrained, Over-parameterized, Two-layer Neural
  Networks
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks
Fanghui Liu
L. Dadi
Volkan Cevher
137
2
0
29 Apr 2024
Regularized Gauss-Newton for Optimizing Overparameterized Neural
  Networks
Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks
Adeyemi Damilare Adeoye
Philipp Christian Petersen
Alberto Bemporad
67
1
0
23 Apr 2024
Matching the Statistical Query Lower Bound for k-sparse Parity Problems
  with Stochastic Gradient Descent
Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent
Yiwen Kou
Zixiang Chen
Quanquan Gu
Sham Kakade
94
0
0
18 Apr 2024
Decoupled Weight Decay for Any $p$ Norm
Decoupled Weight Decay for Any ppp Norm
N. Outmezguine
Noam Levi
86
3
0
16 Apr 2024
NTK-Guided Few-Shot Class Incremental Learning
NTK-Guided Few-Shot Class Incremental Learning
Jingren Liu
Zhong Ji
Yanwei Pang
YunLong Yu
CLL
95
4
0
19 Mar 2024
Posterior Uncertainty Quantification in Neural Networks using Data
  Augmentation
Posterior Uncertainty Quantification in Neural Networks using Data Augmentation
Luhuan Wu
Sinead Williamson
UQCV
91
7
0
18 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime
Generalization of Scaled Deep ResNets in the Mean-Field Regime
Yihang Chen
Fanghui Liu
Yiping Lu
Grigorios G. Chrysos
Volkan Cevher
73
2
0
14 Mar 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field
  Dynamics on the Attention Landscape
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim
Taiji Suzuki
133
24
0
02 Feb 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training
  Dynamics, and Generative Models
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
Namjoon Suh
Guang Cheng
MedIm
109
14
0
14 Jan 2024
A note on regularised NTK dynamics with an application to PAC-Bayesian
  training
A note on regularised NTK dynamics with an application to PAC-Bayesian training
Eugenio Clerico
Benjamin Guedj
112
0
0
20 Dec 2023
Generator Born from Classifier
Generator Born from Classifier
Runpeng Yu
Xinchao Wang
63
4
0
05 Dec 2023
Optimal Sample Complexity of Contrastive Learning
Optimal Sample Complexity of Contrastive Learning
Noga Alon
Dmitrii Avdiukhin
Dor Elboim
Orr Fischer
G. Yaroslavtsev
SSL
73
7
0
01 Dec 2023
Feature emergence via margin maximization: case studies in algebraic
  tasks
Feature emergence via margin maximization: case studies in algebraic tasks
Depen Morwani
Benjamin L. Edelman
Costin-Andrei Oncescu
Rosie Zhao
Sham Kakade
84
16
0
13 Nov 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
92
28
0
04 Oct 2023
Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for
  XOR Data
Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data
Xuran Meng
Difan Zou
Yuan Cao
MLT
93
9
0
03 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing:
  The Curses of Symmetry and Initialization
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
Nuoya Xiong
Lijun Ding
Simon S. Du
126
13
0
03 Oct 2023
SGD Finds then Tunes Features in Two-Layer Neural Networks with
  near-Optimal Sample Complexity: A Case Study in the XOR problem
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem
Margalit Glasgow
MLT
147
14
0
26 Sep 2023
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Pulkit Gopalani
Samyak Jha
Anirbit Mukherjee
62
2
0
17 Sep 2023
How many Neurons do we need? A refined Analysis for Shallow Networks
  trained with Gradient Descent
How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent
Mike Nguyen
Nicole Mücke
MLT
84
6
0
14 Sep 2023
Gradient-Based Feature Learning under Structured Data
Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini
Denny Wu
Taiji Suzuki
Murat A. Erdogdu
MLT
107
20
0
07 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
91
8
0
07 Sep 2023
Fast and Multiphase Rates for Nearest Neighbor Classifiers
Fast and Multiphase Rates for Nearest Neighbor Classifiers
Pengkun Yang
J.N. Zhang
425
0
0
16 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity,
  Sharpness, and Feature Learning
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Ting Yu
MLT
63
3
0
06 Aug 2023
What can a Single Attention Layer Learn? A Study Through the Random
  Features Lens
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
108
26
0
21 Jul 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To
  Achieve Better Generalization
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
104
29
0
20 Jul 2023
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Tianyang Hu
Fei Chen
Hong Wang
Jiawei Li
Wei Cao
Jiacheng Sun
Zechao Li
DiffM
120
10
0
17 Jul 2023
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space
Zhengdao Chen
102
1
0
03 Jul 2023
Continual Learning in Linear Classification on Separable Data
Continual Learning in Linear Classification on Separable Data
Itay Evron
E. Moroshko
G. Buzaglo
M. Khriesh
B. Marjieh
Nathan Srebro
Daniel Soudry
CLL
79
17
0
06 Jun 2023
The Tunnel Effect: Building Data Representations in Deep Neural Networks
The Tunnel Effect: Building Data Representations in Deep Neural Networks
Wojciech Masarczyk
M. Ostaszewski
Ehsan Imani
Razvan Pascanu
Piotr Milo's
Tomasz Trzciñski
92
25
0
31 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Eshaan Nichani
Alexandru Damian
Jason D. Lee
MLT
201
15
0
11 May 2023
Depth Separation with Multilayer Mean-Field Networks
Depth Separation with Multilayer Mean-Field Networks
Y. Ren
Mo Zhou
Rong Ge
OOD
85
3
0
03 Apr 2023
TRAK: Attributing Model Behavior at Scale
TRAK: Attributing Model Behavior at Scale
Sung Min Park
Kristian Georgiev
Andrew Ilyas
Guillaume Leclerc
Aleksander Madry
TDI
122
156
0
24 Mar 2023
Practically Solving LPN in High Noise Regimes Faster Using Neural
  Networks
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
Haozhe Jiang
Kaiyue Wen
Yi-Long Chen
52
0
0
14 Mar 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for
  Learning a Single Neuron
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
Weihang Xu
S. Du
108
16
0
20 Feb 2023
Generalization and Stability of Interpolating Neural Networks with
  Minimal Width
Generalization and Stability of Interpolating Neural Networks with Minimal Width
Hossein Taheri
Christos Thrampoulidis
105
16
0
18 Feb 2023
Pruning Before Training May Improve Generalization, Provably
Pruning Before Training May Improve Generalization, Provably
Hongru Yang
Yingbin Liang
Xiaojie Guo
Lingfei Wu
Zhangyang Wang
MLT
64
2
0
01 Jan 2023
Birth-death dynamics for sampling: Global convergence, approximations
  and their asymptotics
Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics
Yulong Lu
D. Slepčev
Lihan Wang
117
25
0
01 Nov 2022
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
  Neural Networks
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks
Zhengdao Chen
Eric Vanden-Eijnden
Joan Bruna
MLT
77
5
0
28 Oct 2022
1234
Next