v1v2v3v4 (latest)

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

12 October 2018

Papers citing "Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel"

50 / 192 papers shown

Title
Embedding principle of homogeneous neural network for classification problem Jiahan Zhang Yaoyu Zhang Yaoyu Zhang 86 0 0 18 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection Xinyue Zeng Haohui Wang Junhong Lin Jun Wu Tyler Cody Dawei Zhou 446 0 0 01 May 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective Alireza Mousavi-Hosseini Clayton Sanford Denny Wu Murat A. Erdogdu 107 1 0 14 Mar 2025
Learning richness modulates equality reasoning in neural networks William L. Tong Cengiz Pehlevan 66 0 0 12 Mar 2025
Low-rank bias, weight decay, and model merging in neural networks Ilja Kuzborskij Yasin Abbasi-Yadkori 88 0 0 24 Feb 2025
Robust Feature Learning for Multi-Index Models in High Dimensions Alireza Mousavi-Hosseini Adel Javanmard Murat A. Erdogdu OOD AAML 177 1 0 21 Oct 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods Hossein Taheri Christos Thrampoulidis Arya Mazumdar MLT 123 0 0 13 Oct 2024
Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility Rajdeep Haldar Yue Xing Qifan Song Guang Lin 56 0 0 09 Oct 2024
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis Jipeng Han 110 0 0 02 Oct 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics Alireza Mousavi-Hosseini Denny Wu Murat A. Erdogdu MLT AI4CE 101 8 0 14 Aug 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition Mohamad Amin Mohamadi Zhiyuan Li Lei Wu Danica J. Sutherland 112 11 0 17 Jul 2024
Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data Nikita Tsoy Nikola Konstantinov 80 4 0 27 May 2024
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks Fanghui Liu L. Dadi Volkan Cevher 137 2 0 29 Apr 2024
Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks Adeyemi Damilare Adeoye Philipp Christian Petersen Alberto Bemporad 67 1 0 23 Apr 2024
Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent Yiwen Kou Zixiang Chen Quanquan Gu Sham Kakade 94 0 0 18 Apr 2024
Decoupled Weight Decay for Any $p$ Norm N. Outmezguine Noam Levi 86 3 0 16 Apr 2024
NTK-Guided Few-Shot Class Incremental Learning Jingren Liu Zhong Ji Yanwei Pang YunLong Yu CLL 95 4 0 19 Mar 2024
Posterior Uncertainty Quantification in Neural Networks using Data Augmentation Luhuan Wu Sinead Williamson UQCV 91 7 0 18 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime Yihang Chen Fanghui Liu Yiping Lu Grigorios G. Chrysos Volkan Cevher 73 2 0 14 Mar 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape Juno Kim Taiji Suzuki 133 24 0 02 Feb 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models Namjoon Suh Guang Cheng MedIm 109 14 0 14 Jan 2024
A note on regularised NTK dynamics with an application to PAC-Bayesian training Eugenio Clerico Benjamin Guedj 112 0 0 20 Dec 2023
Generator Born from Classifier Runpeng Yu Xinchao Wang 63 4 0 05 Dec 2023
Optimal Sample Complexity of Contrastive Learning Noga Alon Dmitrii Avdiukhin Dor Elboim Orr Fischer G. Yaroslavtsev SSL 73 7 0 01 Dec 2023
Feature emergence via margin maximization: case studies in algebraic tasks Depen Morwani Benjamin L. Edelman Costin-Andrei Oncescu Rosie Zhao Sham Kakade 84 16 0 13 Nov 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data Zhiwei Xu Yutong Wang Spencer Frei Gal Vardi Wei Hu MLT 92 28 0 04 Oct 2023
Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data Xuran Meng Difan Zou Yuan Cao MLT 93 9 0 03 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization Nuoya Xiong Lijun Ding Simon S. Du 126 13 0 03 Oct 2023
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem Margalit Glasgow MLT 147 14 0 26 Sep 2023
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets Pulkit Gopalani Samyak Jha Anirbit Mukherjee 62 2 0 17 Sep 2023
How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent Mike Nguyen Nicole Mücke MLT 84 6 0 14 Sep 2023
Gradient-Based Feature Learning under Structured Data Alireza Mousavi-Hosseini Denny Wu Taiji Suzuki Murat A. Erdogdu MLT 107 20 0 07 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 91 8 0 07 Sep 2023
Fast and Multiphase Rates for Nearest Neighbor Classifiers Pengkun Yang J.N. Zhang 425 0 0 16 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning Nikhil Ghosh Spencer Frei Wooseok Ha Ting Yu MLT 63 3 0 06 Aug 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens Hengyu Fu Tianyu Guo Yu Bai Song Mei MLT 108 26 0 21 Jul 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization Kaiyue Wen Zhiyuan Li Tengyu Ma FAtt 104 29 0 20 Jul 2023
Complexity Matters: Rethinking the Latent Space for Generative Modeling Tianyang Hu Fei Chen Hong Wang Jiawei Li Wei Cao Jiacheng Sun Zechao Li DiffM 120 10 0 17 Jul 2023
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space Zhengdao Chen 102 1 0 03 Jul 2023
Continual Learning in Linear Classification on Separable Data Itay Evron E. Moroshko G. Buzaglo M. Khriesh B. Marjieh Nathan Srebro Daniel Soudry CLL 79 17 0 06 Jun 2023
The Tunnel Effect: Building Data Representations in Deep Neural Networks Wojciech Masarczyk M. Ostaszewski Ehsan Imani Razvan Pascanu Piotr Milo's Tomasz Trzciñski 92 25 0 31 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks Eshaan Nichani Alexandru Damian Jason D. Lee MLT 201 15 0 11 May 2023
Depth Separation with Multilayer Mean-Field Networks Y. Ren Mo Zhou Rong Ge OOD 85 3 0 03 Apr 2023
TRAK: Attributing Model Behavior at Scale Sung Min Park Kristian Georgiev Andrew Ilyas Guillaume Leclerc Aleksander Madry TDI 122 156 0 24 Mar 2023
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks Haozhe Jiang Kaiyue Wen Yi-Long Chen 52 0 0 14 Mar 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron Weihang Xu S. Du 108 16 0 20 Feb 2023
Generalization and Stability of Interpolating Neural Networks with Minimal Width Hossein Taheri Christos Thrampoulidis 105 16 0 18 Feb 2023
Pruning Before Training May Improve Generalization, Provably Hongru Yang Yingbin Liang Xiaojie Guo Lingfei Wu Zhangyang Wang MLT 64 2 0 01 Jan 2023
Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics Yulong Lu D. Slepčev Lihan Wang 117 25 0 01 Nov 2022
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks Zhengdao Chen Eric Vanden-Eijnden Joan Bruna MLT 77 5 0 28 Oct 2022