Feature-Learning Networks Are Consistent Across Widths At Realistic
Scales

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

28 May 2023

Alexander B. Atanasov

Sabarish Sainathan

Cengiz Pehlevan

Papers citing "Feature-Learning Networks Are Consistent Across Widths At Realistic Scales"

15 / 15 papers shown

Title
Don't be lazy: CompleteP enables compute-efficient deep transformers Nolan Dey Bin Claire Zhang Lorenzo Noci Mufan Li Blake Bordelon Shane Bergsma Cengiz Pehlevan Boris Hanin Joel Hestness 44 0 0 02 May 2025
How Does Critical Batch Size Scale in Pre-training? Hanlin Zhang Depen Morwani Nikhil Vyas Jingfeng Wu Difan Zou Udaya Ghai Dean Phillips Foster Sham Kakade 77 8 0 29 Oct 2024
Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices Chanwoo Chun SueYeon Chung Daniel D. Lee 29 1 0 23 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength Alexander B. Atanasov Alexandru Meterez James B. Simon Cengiz Pehlevan 43 2 0 06 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 57 12 0 26 Sep 2024
Infinite Limits of Multi-head Transformer Dynamics Blake Bordelon Hamza Tahir Chaudhry Cengiz Pehlevan AI4CE 47 9 0 24 May 2024
PhiNets: Brain-inspired Non-contrastive Learning Based on Temporal Prediction Hypothesis Satoki Ishikawa Makoto Yamada Han Bao Yuki Takezawa 68 0 0 23 May 2024
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks Blake Bordelon Cengiz Pehlevan MLT 38 29 0 06 Apr 2023
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 250 460 0 24 Sep 2022
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks Blake Bordelon Cengiz Pehlevan MLT 40 78 0 19 May 2022
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 264 4,489 0 23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019
Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach Grant M. Rotskoff Eric Vanden-Eijnden 59 118 0 02 May 2018
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles Balaji Lakshminarayanan Alexander Pritzel Charles Blundell UQCV BDL 276 5,661 0 05 Dec 2016
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 297 10,220 0 16 Nov 2016