ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.06509
  4. Cited By
On the Optimization of Deep Networks: Implicit Acceleration by
  Overparameterization

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

19 February 2018
Sanjeev Arora
Nadav Cohen
Elad Hazan
ArXivPDFHTML

Papers citing "On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization"

50 / 119 papers shown
Title
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
Devon Jarvis
Richard Klein
Benjamin Rosman
Andrew M. Saxe
MLT
66
1
0
08 Mar 2025
An Invitation to Neuroalgebraic Geometry
An Invitation to Neuroalgebraic Geometry
Giovanni Luca Marchetti
Vahid Shahverdi
Stefano Mereta
Matthew Trager
Kathlén Kohn
119
2
0
31 Jan 2025
Optimization Insights into Deep Diagonal Linear Networks
Optimization Insights into Deep Diagonal Linear Networks
Hippolyte Labarrière
C. Molinari
Lorenzo Rosasco
S. Villa
Cristian Vega
76
0
0
21 Dec 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
38
3
0
22 Sep 2024
Compressible Dynamics in Deep Overparameterized Low-Rank Learning &
  Adaptation
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation
Can Yaras
Peng Wang
Laura Balzano
Qing Qu
AI4CE
37
13
0
06 Jun 2024
Structure-Preserving Network Compression Via Low-Rank Induced Training
  Through Linear Layers Composition
Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition
Xitong Zhang
Ismail R. Alkhouri
Rongrong Wang
43
0
0
06 May 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
64
1
0
03 Apr 2024
Understanding the Double Descent Phenomenon in Deep Learning
Understanding the Double Descent Phenomenon in Deep Learning
Marc Lafon
Alexandre Thomas
25
2
0
15 Mar 2024
RepQ: Generalizing Quantization-Aware Training for Re-Parametrized
  Architectures
RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures
Anastasiia Prutianova
Alexey Zaytsev
Chung-Kuei Lee
Fengyu Sun
Ivan Koryakovskiy
MQ
15
0
0
09 Nov 2023
Enhancing Cross-Category Learning in Recommendation Systems with
  Multi-Layer Embedding Training
Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training
Selim F. Yilmaz
Benjamin Ghaemmaghami
A. Singh
Benjamin Cho
Leo Orshansky
Lei Deng
Michael Orshansky
AI4TS
28
0
0
27 Sep 2023
Critical Learning Periods Emerge Even in Deep Linear Networks
Critical Learning Periods Emerge Even in Deep Linear Networks
Michael Kleinman
Alessandro Achille
Stefano Soatto
44
3
0
23 Aug 2023
Early Neuron Alignment in Two-layer ReLU Networks with Small
  Initialization
Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization
Hancheng Min
Enrique Mallada
René Vidal
MLT
34
19
0
24 Jul 2023
FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning
FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning
Chia-Hsiang Kao
Yu-Chiang Frank Wang
FedML
26
1
0
19 Jul 2023
Multiplicative update rules for accelerating deep learning training and
  increasing robustness
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models
Suzanna Parkinson
Greg Ongie
Rebecca Willett
68
6
0
24 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow
  Solutions in Scalar Networks and Beyond
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
33
13
0
22 May 2023
Robust Implicit Regularization via Weight Normalization
Robust Implicit Regularization via Weight Normalization
H. Chou
Holger Rauhut
Rachel A. Ward
36
7
0
09 May 2023
On the Stepwise Nature of Self-Supervised Learning
On the Stepwise Nature of Self-Supervised Learning
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
37
30
0
27 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training
  Efficiency
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
31
3
0
21 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear
  Networks Trained with Bures-Wasserstein Loss
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
33
3
0
06 Mar 2023
Similarity, Compression and Local Steps: Three Pillars of Efficient
  Communications for Distributed Variational Inequalities
Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities
Aleksandr Beznosikov
Martin Takáč
Alexander Gasnikov
31
10
0
15 Feb 2023
On a continuous time model of gradient descent dynamics and instability
  in deep learning
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
20
7
0
03 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Marten Bjorkman
33
7
0
28 Jan 2023
PAC-Bayes Compression Bounds So Tight That They Can Explain
  Generalization
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
29
51
0
24 Nov 2022
Mechanistic Mode Connectivity
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
32
45
0
15 Nov 2022
Adaptive Compression for Communication-Efficient Distributed Training
Adaptive Compression for Communication-Efficient Distributed Training
Maksim Makarenko
Elnur Gasanov
Rustem Islamov
Abdurakhmon Sadiev
Peter Richtárik
39
13
0
31 Oct 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
47
16
0
31 Oct 2022
On the optimization and generalization of overparameterized implicit
  neural networks
On the optimization and generalization of overparameterized implicit neural networks
Tianxiang Gao
Hongyang Gao
MLT
AI4CE
19
3
0
30 Sep 2022
Interneurons accelerate learning dynamics in recurrent neural networks
  for statistical adaptation
Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation
David Lipshutz
Cengiz Pehlevan
D. Chklovskii
25
11
0
21 Sep 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Niladri S. Chatterji
Philip M. Long
23
8
0
19 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
34
72
0
26 Aug 2022
KL-divergence Based Deep Learning for Discrete Time Model
Li Liu
Xiangeng Fang
Di Wang
Weijing Tang
Kevin He
20
1
0
10 Aug 2022
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
Andrew M. Saxe
Shagun Sodhani
Sam Lewallen
AI4CE
30
34
0
21 Jul 2022
Implicit Regularization with Polynomial Growth in Deep Tensor
  Factorization
Implicit Regularization with Polynomial Growth in Deep Tensor Factorization
Kais Hariz
Hachem Kadri
Stéphane Ayache
Maher Moakher
Thierry Artières
26
2
0
18 Jul 2022
Utilizing Excess Resources in Training Neural Networks
Utilizing Excess Resources in Training Neural Networks
Amit Henig
Raja Giryes
53
0
0
12 Jul 2022
q-Learning in Continuous Time
q-Learning in Continuous Time
Yanwei Jia
X. Zhou
OffRL
51
68
0
02 Jul 2022
Non-convex online learning via algorithmic equivalence
Non-convex online learning via algorithmic equivalence
Udaya Ghai
Zhou Lu
Elad Hazan
14
8
0
30 May 2022
Do Residual Neural Networks discretize Neural Ordinary Differential
  Equations?
Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Michael E. Sander
Pierre Ablin
Gabriel Peyré
35
25
0
29 May 2022
Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Zhiwei Bai
Tao Luo
Z. Xu
Yaoyu Zhang
31
4
0
26 May 2022
Empirical Phase Diagram for Three-layer Neural Networks with Infinite
  Width
Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
Hanxu Zhou
Qixuan Zhou
Zhenyuan Jin
Tao Luo
Yaoyu Zhang
Zhi-Qin John Xu
25
20
0
24 May 2022
Symmetry Teleportation for Accelerated Optimization
Symmetry Teleportation for Accelerated Optimization
B. Zhao
Nima Dehmamy
Robin Walters
Rose Yu
ODL
23
20
0
21 May 2022
RepSR: Training Efficient VGG-style Super-Resolution Networks with
  Structural Re-Parameterization and Batch Normalization
RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization
Xintao Wang
Chao Dong
Ying Shan
29
48
0
11 May 2022
Dynamic Programming in Rank Space: Scaling Structured Inference with
  Low-Rank HMMs and PCFGs
Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs
Aaron Courville
Wei Liu
Kewei Tu
21
8
0
01 May 2022
Online Convolutional Re-parameterization
Online Convolutional Re-parameterization
Mu Hu
Junyi Feng
Jiashen Hua
Baisheng Lai
Jianqiang Huang
Xiaojin Gong
Xiansheng Hua
24
26
0
02 Apr 2022
ELLE: Efficient Lifelong Pre-training for Emerging Data
ELLE: Efficient Lifelong Pre-training for Emerging Data
Yujia Qin
Jiajie Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
30
67
0
12 Mar 2022
Side Effects of Learning from Low-dimensional Data Embedded in a
  Euclidean Space
Side Effects of Learning from Low-dimensional Data Embedded in a Euclidean Space
Juncai He
R. Tsai
Rachel A. Ward
36
8
0
01 Mar 2022
Benefit of Interpolation in Nearest Neighbor Algorithms
Benefit of Interpolation in Nearest Neighbor Algorithms
Yue Xing
Qifan Song
Guang Cheng
14
28
0
23 Feb 2022
Understanding Deep Contrastive Learning via Coordinate-wise Optimization
Understanding Deep Contrastive Learning via Coordinate-wise Optimization
Yuandong Tian
52
34
0
29 Jan 2022
123
Next