Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.06509
Cited By
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
19 February 2018
Sanjeev Arora
Nadav Cohen
Elad Hazan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization"
50 / 119 papers shown
Title
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
Devon Jarvis
Richard Klein
Benjamin Rosman
Andrew M. Saxe
MLT
66
1
0
08 Mar 2025
An Invitation to Neuroalgebraic Geometry
G. Marchetti
V. Shahverdi
Stefano Mereta
Matthew Trager
Kathlén Kohn
119
2
0
31 Jan 2025
Optimization Insights into Deep Diagonal Linear Networks
Hippolyte Labarrière
C. Molinari
Lorenzo Rosasco
S. Villa
Cristian Vega
76
0
0
21 Dec 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
38
3
0
22 Sep 2024
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation
Can Yaras
Peng Wang
Laura Balzano
Qing Qu
AI4CE
37
13
0
06 Jun 2024
Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition
Xitong Zhang
Ismail R. Alkhouri
Rongrong Wang
41
0
0
06 May 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
64
1
0
03 Apr 2024
Understanding the Double Descent Phenomenon in Deep Learning
Marc Lafon
Alexandre Thomas
25
2
0
15 Mar 2024
RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures
Anastasiia Prutianova
Alexey Zaytsev
Chung-Kuei Lee
Fengyu Sun
Ivan Koryakovskiy
MQ
13
0
0
09 Nov 2023
Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training
Selim F. Yilmaz
Benjamin Ghaemmaghami
A. Singh
Benjamin Cho
Leo Orshansky
Lei Deng
Michael Orshansky
AI4TS
28
0
0
27 Sep 2023
Critical Learning Periods Emerge Even in Deep Linear Networks
Michael Kleinman
Alessandro Achille
Stefano Soatto
44
3
0
23 Aug 2023
Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization
Hancheng Min
Enrique Mallada
René Vidal
MLT
34
19
0
24 Jul 2023
FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning
Chia-Hsiang Kao
Yu-Chiang Frank Wang
FedML
26
1
0
19 Jul 2023
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models
Suzanna Parkinson
Greg Ongie
Rebecca Willett
65
6
0
24 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
33
13
0
22 May 2023
Robust Implicit Regularization via Weight Normalization
H. Chou
Holger Rauhut
Rachel A. Ward
33
7
0
09 May 2023
On the Stepwise Nature of Self-Supervised Learning
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
37
30
0
27 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
31
3
0
21 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
33
3
0
06 Mar 2023
Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities
Aleksandr Beznosikov
Martin Takáč
Alexander Gasnikov
29
10
0
15 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
18
7
0
03 Feb 2023
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Marten Bjorkman
31
7
0
28 Jan 2023
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
29
51
0
24 Nov 2022
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
32
45
0
15 Nov 2022
Adaptive Compression for Communication-Efficient Distributed Training
Maksim Makarenko
Elnur Gasanov
Rustem Islamov
Abdurakhmon Sadiev
Peter Richtárik
39
13
0
31 Oct 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
47
16
0
31 Oct 2022
On the optimization and generalization of overparameterized implicit neural networks
Tianxiang Gao
Hongyang Gao
MLT
AI4CE
19
3
0
30 Sep 2022
Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation
David Lipshutz
Cengiz Pehlevan
D. Chklovskii
25
11
0
21 Sep 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Niladri S. Chatterji
Philip M. Long
23
8
0
19 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
34
72
0
26 Aug 2022
KL-divergence Based Deep Learning for Discrete Time Model
Li Liu
Xiangeng Fang
Di Wang
Weijing Tang
Kevin He
18
1
0
10 Aug 2022
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
Andrew M. Saxe
Shagun Sodhani
Sam Lewallen
AI4CE
30
34
0
21 Jul 2022
Implicit Regularization with Polynomial Growth in Deep Tensor Factorization
Kais Hariz
Hachem Kadri
Stéphane Ayache
Maher Moakher
Thierry Artières
26
2
0
18 Jul 2022
Utilizing Excess Resources in Training Neural Networks
Amit Henig
Raja Giryes
53
0
0
12 Jul 2022
q-Learning in Continuous Time
Yanwei Jia
X. Zhou
OffRL
51
68
0
02 Jul 2022
Non-convex online learning via algorithmic equivalence
Udaya Ghai
Zhou Lu
Elad Hazan
14
8
0
30 May 2022
Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Michael E. Sander
Pierre Ablin
Gabriel Peyré
35
25
0
29 May 2022
Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Zhiwei Bai
Tao Luo
Z. Xu
Yaoyu Zhang
31
4
0
26 May 2022
Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
Hanxu Zhou
Qixuan Zhou
Zhenyuan Jin
Tao Luo
Yaoyu Zhang
Zhi-Qin John Xu
25
20
0
24 May 2022
Symmetry Teleportation for Accelerated Optimization
B. Zhao
Nima Dehmamy
Robin Walters
Rose Yu
ODL
23
20
0
21 May 2022
RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization
Xintao Wang
Chao Dong
Ying Shan
24
48
0
11 May 2022
Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs
Aaron Courville
Wei Liu
Kewei Tu
21
8
0
01 May 2022
Online Convolutional Re-parameterization
Mu Hu
Junyi Feng
Jiashen Hua
Baisheng Lai
Jianqiang Huang
Xiaojin Gong
Xiansheng Hua
24
26
0
02 Apr 2022
ELLE: Efficient Lifelong Pre-training for Emerging Data
Yujia Qin
Jiajie Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
30
67
0
12 Mar 2022
Side Effects of Learning from Low-dimensional Data Embedded in a Euclidean Space
Juncai He
R. Tsai
Rachel A. Ward
36
8
0
01 Mar 2022
Benefit of Interpolation in Nearest Neighbor Algorithms
Yue Xing
Qifan Song
Guang Cheng
14
28
0
23 Feb 2022
Understanding Deep Contrastive Learning via Coordinate-wise Optimization
Yuandong Tian
52
34
0
29 Jan 2022
1
2
3
Next