Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.05369
Cited By
v1
v2
v3
v4 (latest)
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
12 October 2018
Colin Wei
Jason D. Lee
Qiang Liu
Tengyu Ma
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel"
50 / 192 papers shown
Title
Embedding principle of homogeneous neural network for classification problem
Jiahan Zhang
Yaoyu Zhang
Yaoyu Zhang
86
0
0
18 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
446
0
0
01 May 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
107
1
0
14 Mar 2025
Learning richness modulates equality reasoning in neural networks
William L. Tong
Cengiz Pehlevan
66
0
0
12 Mar 2025
Low-rank bias, weight decay, and model merging in neural networks
Ilja Kuzborskij
Yasin Abbasi-Yadkori
88
0
0
24 Feb 2025
Robust Feature Learning for Multi-Index Models in High Dimensions
Alireza Mousavi-Hosseini
Adel Javanmard
Murat A. Erdogdu
OOD
AAML
177
1
0
21 Oct 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri
Christos Thrampoulidis
Arya Mazumdar
MLT
123
0
0
13 Oct 2024
Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility
Rajdeep Haldar
Yue Xing
Qifan Song
Guang Lin
56
0
0
09 Oct 2024
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
Jipeng Han
110
0
0
02 Oct 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Alireza Mousavi-Hosseini
Denny Wu
Murat A. Erdogdu
MLT
AI4CE
101
8
0
14 Aug 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Mohamad Amin Mohamadi
Zhiyuan Li
Lei Wu
Danica J. Sutherland
112
11
0
17 Jul 2024
Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data
Nikita Tsoy
Nikola Konstantinov
80
4
0
27 May 2024
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks
Fanghui Liu
L. Dadi
Volkan Cevher
137
2
0
29 Apr 2024
Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks
Adeyemi Damilare Adeoye
Philipp Christian Petersen
Alberto Bemporad
67
1
0
23 Apr 2024
Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent
Yiwen Kou
Zixiang Chen
Quanquan Gu
Sham Kakade
94
0
0
18 Apr 2024
Decoupled Weight Decay for Any
p
p
p
Norm
N. Outmezguine
Noam Levi
86
3
0
16 Apr 2024
NTK-Guided Few-Shot Class Incremental Learning
Jingren Liu
Zhong Ji
Yanwei Pang
YunLong Yu
CLL
95
4
0
19 Mar 2024
Posterior Uncertainty Quantification in Neural Networks using Data Augmentation
Luhuan Wu
Sinead Williamson
UQCV
91
7
0
18 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime
Yihang Chen
Fanghui Liu
Yiping Lu
Grigorios G. Chrysos
Volkan Cevher
73
2
0
14 Mar 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim
Taiji Suzuki
133
24
0
02 Feb 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
Namjoon Suh
Guang Cheng
MedIm
109
14
0
14 Jan 2024
A note on regularised NTK dynamics with an application to PAC-Bayesian training
Eugenio Clerico
Benjamin Guedj
112
0
0
20 Dec 2023
Generator Born from Classifier
Runpeng Yu
Xinchao Wang
63
4
0
05 Dec 2023
Optimal Sample Complexity of Contrastive Learning
Noga Alon
Dmitrii Avdiukhin
Dor Elboim
Orr Fischer
G. Yaroslavtsev
SSL
73
7
0
01 Dec 2023
Feature emergence via margin maximization: case studies in algebraic tasks
Depen Morwani
Benjamin L. Edelman
Costin-Andrei Oncescu
Rosie Zhao
Sham Kakade
84
16
0
13 Nov 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
92
28
0
04 Oct 2023
Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data
Xuran Meng
Difan Zou
Yuan Cao
MLT
93
9
0
03 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
Nuoya Xiong
Lijun Ding
Simon S. Du
126
13
0
03 Oct 2023
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem
Margalit Glasgow
MLT
147
14
0
26 Sep 2023
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Pulkit Gopalani
Samyak Jha
Anirbit Mukherjee
62
2
0
17 Sep 2023
How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent
Mike Nguyen
Nicole Mücke
MLT
84
6
0
14 Sep 2023
Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini
Denny Wu
Taiji Suzuki
Murat A. Erdogdu
MLT
107
20
0
07 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
91
8
0
07 Sep 2023
Fast and Multiphase Rates for Nearest Neighbor Classifiers
Pengkun Yang
J.N. Zhang
425
0
0
16 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Ting Yu
MLT
63
3
0
06 Aug 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
108
26
0
21 Jul 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
104
29
0
20 Jul 2023
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Tianyang Hu
Fei Chen
Hong Wang
Jiawei Li
Wei Cao
Jiacheng Sun
Zechao Li
DiffM
120
10
0
17 Jul 2023
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space
Zhengdao Chen
102
1
0
03 Jul 2023
Continual Learning in Linear Classification on Separable Data
Itay Evron
E. Moroshko
G. Buzaglo
M. Khriesh
B. Marjieh
Nathan Srebro
Daniel Soudry
CLL
79
17
0
06 Jun 2023
The Tunnel Effect: Building Data Representations in Deep Neural Networks
Wojciech Masarczyk
M. Ostaszewski
Ehsan Imani
Razvan Pascanu
Piotr Milo's
Tomasz Trzciñski
92
25
0
31 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Eshaan Nichani
Alexandru Damian
Jason D. Lee
MLT
201
15
0
11 May 2023
Depth Separation with Multilayer Mean-Field Networks
Y. Ren
Mo Zhou
Rong Ge
OOD
85
3
0
03 Apr 2023
TRAK: Attributing Model Behavior at Scale
Sung Min Park
Kristian Georgiev
Andrew Ilyas
Guillaume Leclerc
Aleksander Madry
TDI
122
156
0
24 Mar 2023
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
Haozhe Jiang
Kaiyue Wen
Yi-Long Chen
52
0
0
14 Mar 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
Weihang Xu
S. Du
108
16
0
20 Feb 2023
Generalization and Stability of Interpolating Neural Networks with Minimal Width
Hossein Taheri
Christos Thrampoulidis
105
16
0
18 Feb 2023
Pruning Before Training May Improve Generalization, Provably
Hongru Yang
Yingbin Liang
Xiaojie Guo
Lingfei Wu
Zhangyang Wang
MLT
64
2
0
01 Jan 2023
Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics
Yulong Lu
D. Slepčev
Lihan Wang
117
25
0
01 Nov 2022
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks
Zhengdao Chen
Eric Vanden-Eijnden
Joan Bruna
MLT
77
5
0
28 Oct 2022
1
2
3
4
Next