Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.06963
Cited By
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
15 April 2019
Karthik A. Sankararaman
Soham De
Zheng Xu
W. R. Huang
Tom Goldstein
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent"
27 / 27 papers shown
Title
SpINR: Neural Volumetric Reconstruction for FMCW Radars
Harshvardhan Takawale
Nirupam Roy
30
0
0
30 Mar 2025
High-Fidelity Neural Phonetic Posteriorgrams
Cameron Churchwell
Max Morrison
Bryan Pardo
38
4
0
27 Feb 2024
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
Ghada Sokar
Rishabh Agarwal
P. S. Castro
Utku Evci
CLL
40
88
0
24 Feb 2023
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics
J. Lazzari
Xiuwen Liu
24
3
0
14 Jan 2023
On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
S. Takagi
OffRL
18
7
0
17 Nov 2022
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms
Lam M. Nguyen
Trang H. Tran
32
2
0
13 Jun 2022
Convergence of gradient descent for deep neural networks
S. Chatterjee
ODL
21
20
0
30 Mar 2022
Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions
Martin Hutzenthaler
Arnulf Jentzen
Katharina Pohl
Adrian Riekert
Luca Scarpa
MLT
34
6
0
13 Dec 2021
Efficient and Private Federated Learning with Partially Trainable Networks
Hakim Sidahmed
Zheng Xu
Ankush Garg
Yuan Cao
Mingqing Chen
FedML
49
13
0
06 Oct 2021
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization
Alexandre Ramé
Corentin Dancette
Matthieu Cord
OOD
38
204
0
07 Sep 2021
A general sample complexity analysis of vanilla policy gradient
Rui Yuan
Robert Mansel Gower
A. Lazaric
69
62
0
23 Jul 2021
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Arnulf Jentzen
Adrian Riekert
MLT
32
13
0
01 Apr 2021
Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability
Wei-Tsung Kao
Hung-yi Lee
16
16
0
12 Mar 2021
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
Arnulf Jentzen
T. Kröger
ODL
28
7
0
23 Feb 2021
A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
Patrick Cheridito
Arnulf Jentzen
Adrian Riekert
Florian Rossmannek
23
24
0
19 Feb 2021
DOTS: Decoupling Operation and Topology in Differentiable Architecture Search
Yuchao Gu
Li-Juan Wang
Yun-Hai Liu
Yi Yang
Yu-Huan Wu
Shao-Ping Lu
Ming-Ming Cheng
29
48
0
02 Oct 2020
Rethinking Bottleneck Structure for Efficient Mobile Network Design
Zhou Daquan
Qibin Hou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
24
197
0
05 Jul 2020
Non-convergence of stochastic gradient descent in the training of deep neural networks
Patrick Cheridito
Arnulf Jentzen
Florian Rossmannek
14
37
0
12 Jun 2020
Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization
S. Chatterjee
ODL
OOD
11
48
0
25 Feb 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
14
20
0
24 Feb 2020
Better Theory for SGD in the Nonconvex World
Ahmed Khaled
Peter Richtárik
13
178
0
09 Feb 2020
Exploiting Operation Importance for Differentiable Neural Architecture Search
Xukai Xie
Yuan Zhou
S. Kung
17
34
0
24 Nov 2019
A type of generalization error induced by initialization in deep neural networks
Yaoyu Zhang
Zhi-Qin John Xu
Tao Luo
Zheng Ma
9
49
0
19 May 2019
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
220
348
0
14 Jun 2018
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
139
1,199
0
16 Aug 2016
Benefits of depth in neural networks
Matus Telgarsky
142
602
0
14 Feb 2016
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
101
570
0
08 Dec 2012
1