Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.03215
Cited By
Multi-scale Feature Learning Dynamics: Insights for Double Descent
6 December 2021
Mohammad Pezeshki
Amartya Mitra
Yoshua Bengio
Guillaume Lajoie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-scale Feature Learning Dynamics: Insights for Double Descent"
19 / 19 papers shown
Title
A dynamic view of the double descent
Vivek Shripad Borkar
55
0
0
03 May 2025
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Roman Abramov
Felix Steinbauer
Gjergji Kasneci
132
0
0
29 Apr 2025
On the Relationship Between Double Descent of CNNs and Shape/Texture Bias Under Learning Process
Shun Iwase
Shuya Takahashi
Nakamasa Inoue
Rio Yokota
Ryo Nakamura
Hirokatsu Kataoka
72
0
0
04 Mar 2025
The Fair Language Model Paradox
Andrea Pinto
Tomer Galanti
Randall Balestriero
23
0
0
15 Oct 2024
Unified Neural Network Scaling Laws and Scale-time Equivalence
Akhilan Boopathy
Ila Fiete
35
0
0
09 Sep 2024
Towards understanding epoch-wise double descent in two-layer linear neural networks
Amanda Olmin
Fredrik Lindsten
MLT
27
3
0
13 Jul 2024
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Jaerin Lee
Bong Gyun Kang
Kihoon Kim
Kyoung Mu Lee
33
11
0
30 May 2024
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets
Lorenzo Brigato
S. Mougiakakou
27
3
0
04 Sep 2023
Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy
A. Puli
Lily H. Zhang
Yoav Wald
Rajesh Ranganath
13
19
0
24 Aug 2023
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Pascal Junior Tikeng Notsawo
Hattie Zhou
Mohammad Pezeshki
Irina Rish
G. Dumas
19
23
0
23 Jun 2023
Deep incremental learning models for financial temporal tabular datasets with distribution shifts
Thomas Wong
Mauricio Barahona
OOD
AIFin
AI4TS
16
0
0
14 Mar 2023
Unifying Grokking and Double Descent
Peter W. Battaglia
David Raposo
Kelsey
32
31
0
10 Mar 2023
Over-training with Mixup May Hurt Generalization
Zixuan Liu
Ziqiao Wang
Hongyu Guo
Yongyi Mao
NoLa
21
11
0
02 Mar 2023
Grokking phase transitions in learning local rules with gradient descent
Bojan Žunkovič
E. Ilievski
63
16
0
26 Oct 2022
The BUTTER Zone: An Empirical Study of Training Dynamics in Fully Connected Neural Networks
Charles Edison Tripp
J. Perr-Sauer
L. Hayne
M. Lunacek
Jamil Gafur
AI4CE
21
0
0
25 Jul 2022
Towards Understanding Grokking: An Effective Theory of Representation Learning
Ziming Liu
O. Kitouni
Niklas Nolte
Eric J. Michaud
Max Tegmark
Mike Williams
AI4CE
17
143
0
20 May 2022
Generalizing similarity in noisy setups: the DIBS phenomenon
Nayara Fonseca
V. Guidetti
11
0
0
30 Jan 2022
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
99
77
0
08 Dec 2020
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
Florent Krzakala
88
152
0
02 Mar 2020
1