Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

14 June 2023

Papers citing "Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning"

10 / 10 papers shown

Title
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit Oleg Filatov Jan Ebert Jiangtao Wang Stefan Kesselheim 44 4 0 10 Jan 2025
The Optimization Landscape of SGD Across the Feature Learning Strength Alexander B. Atanasov Alexandru Meterez James B. Simon Cengiz Pehlevan 55 2 0 06 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 59 12 0 26 Sep 2024
A Quadratic Synchronization Rule for Distributed Deep Learning Xinran Gu Kaifeng Lyu Sanjeev Arora Jingzhao Zhang Longbo Huang 54 1 0 22 Oct 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis Fuzhao Xue Yao Fu Wangchunshu Zhou Zangwei Zheng Yang You 88 79 0 22 May 2023
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 252 474 0 24 Sep 2022
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization Stanislaw Jastrzebski Devansh Arpit Oliver Åstrand Giancarlo Kerg Huan Wang Caiming Xiong R. Socher Kyunghyun Cho Krzysztof J. Geras AI4CE 184 66 0 28 Dec 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 236 0 04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 310 2,896 0 15 Sep 2016