Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

18 February 2019

Jascha Narain Sohl-Dickstein

Jeffrey Pennington

ArXiv PDF HTML

Papers citing "Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent"

50 / 261 papers shown

Title
Quantifying Epistemic Uncertainty in Deep Learning Ziyi Huang H. Lam Haofeng Zhang UQCV BDL UD PER 24 12 0 23 Oct 2021
Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning Soufiane Hayou Bo He Gintare Karolina Dziugaite 37 2 0 22 Oct 2021
Deep Active Learning by Leveraging Training Dynamics Haonan Wang Wei Huang Ziwei Wu A. Margenot Hanghang Tong Jingrui He AI4CE 27 33 0 16 Oct 2021
AIR-Net: Adaptive and Implicit Regularization Neural Network for Matrix Completion Zhemin Li Tao Sun Hongxia Wang Bao Wang 50 6 0 12 Oct 2021
New Insights into Graph Convolutional Networks using Neural Tangent Kernels Mahalakshmi Sabanayagam P. Esser D. Ghoshdastidar 26 6 0 08 Oct 2021
Improved architectures and training algorithms for deep operator networks Sizhuang He Hanwen Wang P. Perdikaris AI4CE 52 105 0 04 Oct 2021
Fast and Sample-Efficient Interatomic Neural Network Potentials for Molecules and Materials Based on Gaussian Moments Viktor Zaverkin David Holzmüller Ingo Steinwart Johannes Kastner 29 19 0 20 Sep 2021
NASI: Label- and Data-agnostic Neural Architecture Search at Initialization Yao Shu Shaofeng Cai Zhongxiang Dai Beng Chin Ooi K. H. Low 22 43 0 02 Sep 2021
On Accelerating Distributed Convex Optimizations Kushal Chakrabarti Nirupam Gupta Nikhil Chopra 29 7 0 19 Aug 2021
Dataset Distillation with Infinitely Wide Convolutional Networks Timothy Nguyen Roman Novak Lechao Xiao Jaehoon Lee DD 51 229 0 27 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 31 15 0 19 Jul 2021
The Values Encoded in Machine Learning Research Abeba Birhane Pratyusha Kalluri Dallas Card William Agnew Ravit Dotan Michelle Bao 35 274 0 29 Jun 2021
Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation Haoxiang Wang Han Zhao Bo-wen Li 37 88 0 16 Jun 2021
Locality defeats the curse of dimensionality in convolutional teacher-student scenarios Alessandro Favero Francesco Cagnetta M. Wyart 30 31 0 16 Jun 2021
How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective Akhilan Boopathy Ila Fiete 27 9 0 15 Jun 2021
What can linearized neural networks actually say about generalization? Guillermo Ortiz-Jiménez Seyed-Mohsen Moosavi-Dezfooli P. Frossard 29 43 0 12 Jun 2021
The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective Geoff Pleiss John P. Cunningham 28 24 0 11 Jun 2021
A Neural Tangent Kernel Perspective of GANs Jean-Yves Franceschi Emmanuel de Bézenac Ibrahim Ayed Mickaël Chen Sylvain Lamprier Patrick Gallinari 34 26 0 10 Jun 2021
A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs Gadi Naveh Zohar Ringel SSL MLT 36 31 0 08 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization Mufan Li Mihai Nica Daniel M. Roy 32 33 0 07 Jun 2021
Priors in Bayesian Deep Learning: A Review Vincent Fortuin UQCV BDL 31 124 0 14 May 2021
Global Convergence of Three-layer Neural Networks in the Mean Field Regime H. Pham Phan-Minh Nguyen MLT AI4CE 41 19 0 11 May 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes James Lucas Juhan Bae Michael Ruogu Zhang Stanislav Fort R. Zemel Roger C. Grosse MoMe 164 28 0 22 Apr 2021
Unsupervised Shape Completion via Deep Prior in the Neural Tangent Kernel Perspective Lei Chu Hao Pan Wenping Wang 3DPC 34 11 0 19 Apr 2021
Fast Adaptation with Linearized Neural Networks Wesley J. Maddox Shuai Tang Pablo G. Moreno A. Wilson Andreas C. Damianou 32 32 0 02 Mar 2021
Computing the Information Content of Trained Neural Networks Jeremy Bernstein Yisong Yue 27 4 0 01 Mar 2021
Experiments with Rich Regime Training for Deep Learning Xinyan Li A. Banerjee 32 2 0 26 Feb 2021
Provable Super-Convergence with a Large Cyclical Learning Rate Samet Oymak 33 12 0 22 Feb 2021
Explaining Neural Scaling Laws Yasaman Bahri Ethan Dyer Jared Kaplan Jaehoon Lee Utkarsh Sharma 27 250 0 12 Feb 2021
A linearized framework and a new benchmark for model selection for fine-tuning Aditya Deshpande Alessandro Achille Avinash Ravichandran Hao Li L. Zancato Charless C. Fowlkes Rahul Bhotika Stefano Soatto Pietro Perona ALM 118 46 0 29 Jan 2021
Estimating informativeness of samples with Smooth Unique Information Hrayr Harutyunyan Alessandro Achille Giovanni Paolini Orchid Majumder Avinash Ravichandran Rahul Bhotika Stefano Soatto 27 24 0 17 Jan 2021
Reproducing Activation Function for Deep Learning Senwei Liang Liyao Lyu Chunmei Wang Haizhao Yang 36 21 0 13 Jan 2021
LQF: Linear Quadratic Fine-Tuning Alessandro Achille Aditya Golatkar Avinash Ravichandran M. Polito Stefano Soatto 29 27 0 21 Dec 2020
On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks Sizhuang He Hanwen Wang P. Perdikaris 131 439 0 18 Dec 2020
Faster Non-Convex Federated Learning via Global and Local Momentum Rudrajit Das Anish Acharya Abolfazl Hashemi Sujay Sanghavi Inderjit S. Dhillon Ufuk Topcu FedML 37 82 0 07 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent Kangqiao Liu Liu Ziyin Masakuni Ueda MLT 61 37 0 07 Dec 2020
Fourier-domain Variational Formulation and Its Well-posedness for Supervised Learning Tao Luo Zheng Ma Zhiwei Wang Zhi-Qin John Xu Yaoyu Zhang OOD 44 4 0 06 Dec 2020
Gradient Starvation: A Learning Proclivity in Neural Networks Mohammad Pezeshki Sekouba Kaba Yoshua Bengio Aaron Courville Doina Precup Guillaume Lajoie MLT 50 257 0 18 Nov 2020
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces Zhuoran Yang Chi Jin Zhaoran Wang Mengdi Wang Michael I. Jordan 39 18 0 09 Nov 2020
Dataset Meta-Learning from Kernel Ridge-Regression Timothy Nguyen Zhourung Chen Jaehoon Lee DD 36 240 0 30 Oct 2020
Scaling Laws for Autoregressive Generative Modeling T. Henighan Jared Kaplan Mor Katz Mark Chen Christopher Hesse ... Nick Ryder Daniel M. Ziegler John Schulman Dario Amodei Sam McCandlish 32 405 0 28 Oct 2020
Are wider nets better given the same number of parameters? A. Golubeva Behnam Neyshabur Guy Gur-Ari 27 44 0 27 Oct 2020
Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models J. Rocks Pankaj Mehta 18 41 0 26 Oct 2020
A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks Zhiqi Bu Shiyun Xu Kan Chen 33 17 0 25 Oct 2020
Stable ResNet Soufiane Hayou Eugenio Clerico Bo He George Deligiannidis Arnaud Doucet Judith Rousseau ODL SSeg 46 51 0 24 Oct 2020
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime Andrea Agazzi Jianfeng Lu 13 15 0 22 Oct 2020
Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher Guangda Ji Zhanxing Zhu 59 42 0 20 Oct 2020
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix T. Doan Mehdi Abbana Bennani Bogdan Mazoure Guillaume Rabusseau Pierre Alquier CLL 20 80 0 07 Oct 2020
On the linearity of large non-linear models: when and why the tangent kernel is constant Chaoyue Liu Libin Zhu M. Belkin 21 140 0 02 Oct 2020
Tensor Programs III: Neural Matrix Laws Greg Yang 14 43 0 22 Sep 2020