ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.03744
  4. Cited By
Which Neural Net Architectures Give Rise To Exploding and Vanishing
  Gradients?

Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?

11 January 2018
Boris Hanin
ArXivPDFHTML

Papers citing "Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?"

50 / 50 papers shown
Title
HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
Zheng Lin
Yuxin Zhang
Zhe Chen
Zihan Fang
Xianhao Chen
Praneeth Vepakomma
Wei Ni
Jun Luo
Yue Gao
MoE
49
2
0
05 May 2025
Low-Loss Space in Neural Networks is Continuous and Fully Connected
Low-Loss Space in Neural Networks is Continuous and Fully Connected
Yongding Tian
Zaid Al-Ars
Maksim Kitsak
P. Hofstee
3DPC
31
1
0
05 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
44
1
0
02 May 2025
Deep Neural Nets as Hamiltonians
Deep Neural Nets as Hamiltonians
Mike Winer
Boris Hanin
205
0
0
31 Mar 2025
FAIR: Facilitating Artificial Intelligence Resilience in Manufacturing Industrial Internet
Yingyan Zeng
Ismini Lourentzou
Xinwei Deng
R. Jin
AI4CE
66
0
0
03 Mar 2025
Federated Learning with Flexible Architectures
Federated Learning with Flexible Architectures
Jong-Ik Park
Carlee Joe-Wong
FedML
45
3
0
14 Jun 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A
  Comprehensive Survey
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
39
6
0
09 Apr 2024
Quantitative CLTs in Deep Neural Networks
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
43
12
0
12 Jul 2023
Intelligent gradient amplification for deep neural networks
Intelligent gradient amplification for deep neural networks
S. Basodi
K. Pusuluri
Xueli Xiao
Yi Pan
ODL
21
1
0
29 May 2023
Depth Dependence of $μ$P Learning Rates in ReLU MLPs
Depth Dependence of μμμP Learning Rates in ReLU MLPs
Samy Jelassi
Boris Hanin
Ziwei Ji
Sashank J. Reddi
Srinadh Bhojanapalli
Surinder Kumar
27
6
0
13 May 2023
A Neural Emulator for Uncertainty Estimation of Fire Propagation
A Neural Emulator for Uncertainty Estimation of Fire Propagation
Andrew Bolt
Conrad Sanderson
J. Dabrowski
C. Huston
Petra Kuhnert
32
3
0
10 May 2023
On Model Compression for Neural Networks: Framework, Algorithm, and
  Convergence Guarantee
On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee
Chenyang Li
Jihoon Chung
Mengnan Du
Haimin Wang
Xianlian Zhou
Bohao Shen
33
1
0
13 Mar 2023
Error convergence and engineering-guided hyperparameter search of PINNs:
  towards optimized I-FENN performance
Error convergence and engineering-guided hyperparameter search of PINNs: towards optimized I-FENN performance
Panos Pantidis
Habiba Eldababy
Christopher Miguel Tagle
M. Mobasher
35
20
0
03 Mar 2023
Expected Gradients of Maxout Networks and Consequences to Parameter
  Initialization
Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
Hanna Tseran
Guido Montúfar
ODL
32
0
0
17 Jan 2023
Langevin algorithms for very deep Neural Networks with application to
  image classification
Langevin algorithms for very deep Neural Networks with application to image classification
Pierre Bras
25
6
0
27 Dec 2022
Langevin algorithms for Markovian Neural Networks and Deep Stochastic
  control
Langevin algorithms for Markovian Neural Networks and Deep Stochastic control
Pierre Bras
Gilles Pagès
30
3
0
22 Dec 2022
Accelerating Dataset Distillation via Model Augmentation
Accelerating Dataset Distillation via Model Augmentation
Lei Zhang
Jie M. Zhang
Bowen Lei
Subhabrata Mukherjee
Xiang Pan
Bo Zhao
Caiwen Ding
Heng Chang
Dongkuan Xu
DD
47
62
0
12 Dec 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
40
2
0
05 Oct 2022
Graph Neural Networks Extract High-Resolution Cultivated Land Maps from
  Sentinel-2 Image Series
Graph Neural Networks Extract High-Resolution Cultivated Land Maps from Sentinel-2 Image Series
Lukasz Tulczyjew
M. Kawulok
Nicolas Longépé
Bertrand Le Saux
J. Nalepa
21
14
0
03 Aug 2022
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate
N. H. Phong
A. Santos
B. Ribeiro
27
8
0
20 May 2022
Regularization by Misclassification in ReLU Neural Networks
Regularization by Misclassification in ReLU Neural Networks
Elisabetta Cornacchia
Jan Hązła
Ido Nachum
Amir Yehudayoff
NoLa
25
2
0
03 Nov 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers
Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers
Yunhui Guo
Xudong Wang
Yubei Chen
Stella X. Yu
26
45
0
23 Jul 2021
Random Neural Networks in the Infinite Width Limit as Gaussian Processes
Random Neural Networks in the Infinite Width Limit as Gaussian Processes
Boris Hanin
BDL
34
44
0
04 Jul 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
  Limit at Initialization
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
Mufan Li
Mihai Nica
Daniel M. Roy
40
33
0
07 Jun 2021
Deep Kronecker neural networks: A general framework for neural networks
  with adaptive activation functions
Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions
Ameya Dilip Jagtap
Yeonjong Shin
Kenji Kawaguchi
George Karniadakis
ODL
45
131
0
20 May 2021
Activation function design for deep networks: linearity and effective
  initialisation
Activation function design for deep networks: linearity and effective initialisation
Michael Murray
V. Abrol
Jared Tanner
ODL
LLMSV
29
18
0
17 May 2021
Deep limits and cut-off phenomena for neural networks
Deep limits and cut-off phenomena for neural networks
B. Avelin
A. Karlsson
AI4CE
40
2
0
21 Apr 2021
A proof of convergence for stochastic gradient descent in the training
  of artificial neural networks with ReLU activation for constant target
  functions
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Arnulf Jentzen
Adrian Riekert
MLT
37
13
0
01 Apr 2021
Convergence rates for gradient descent in the training of
  overparameterized artificial neural networks with biases
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
Arnulf Jentzen
T. Kröger
ODL
28
7
0
23 Feb 2021
Deep ReLU Networks Preserve Expected Length
Deep ReLU Networks Preserve Expected Length
Boris Hanin
Ryan Jeong
David Rolnick
29
14
0
21 Feb 2021
A proof of convergence for gradient descent in the training of
  artificial neural networks for constant target functions
A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
Patrick Cheridito
Arnulf Jentzen
Adrian Riekert
Florian Rossmannek
28
24
0
19 Feb 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural
  Networks
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks
Asaf Noy
Yi Tian Xu
Y. Aflalo
Lihi Zelnik-Manor
Rong Jin
41
3
0
12 Jan 2021
Advances in Electron Microscopy with Deep Learning
Advances in Electron Microscopy with Deep Learning
Jeffrey M. Ede
40
2
0
04 Jan 2021
Towards a Mathematical Understanding of Neural Network-Based Machine
  Learning: what we know and what we don't
Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't
E. Weinan
Chao Ma
Stephan Wojtowytsch
Lei Wu
AI4CE
29
133
0
22 Sep 2020
Tensor Programs III: Neural Matrix Laws
Tensor Programs III: Neural Matrix Laws
Greg Yang
19
45
0
22 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
38
79
0
17 Sep 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Greg Yang
58
135
0
25 Jun 2020
Non-convergence of stochastic gradient descent in the training of deep
  neural networks
Non-convergence of stochastic gradient descent in the training of deep neural networks
Patrick Cheridito
Arnulf Jentzen
Florian Rossmannek
14
37
0
12 Jun 2020
Composite Travel Generative Adversarial Networks for Tabular and
  Sequential Population Synthesis
Composite Travel Generative Adversarial Networks for Tabular and Sequential Population Synthesis
Godwin Badu-Marfo
Bilal Farooq
Zachary Patterson
39
31
0
15 Apr 2020
A Survey of Deep Learning for Scientific Discovery
A Survey of Deep Learning for Scientific Discovery
M. Raghu
Erica Schmidt
OOD
AI4CE
42
120
0
26 Mar 2020
Machine Learning from a Continuous Viewpoint
Machine Learning from a Continuous Viewpoint
E. Weinan
Chao Ma
Lei Wu
33
102
0
30 Dec 2019
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
27
168
0
19 Dec 2019
Finite Depth and Width Corrections to the Neural Tangent Kernel
Finite Depth and Width Corrections to the Neural Tangent Kernel
Boris Hanin
Mihai Nica
MDE
30
149
0
13 Sep 2019
Infinitely deep neural networks as diffusion processes
Infinitely deep neural networks as diffusion processes
Stefano Peluchetti
Stefano Favaro
ODL
16
31
0
27 May 2019
Data driven approximation of parametrized PDEs by Reduced Basis and
  Neural Networks
Data driven approximation of parametrized PDEs by Reduced Basis and Neural Networks
N. D. Santo
S. Deparis
Luca Pegolotti
24
66
0
02 Apr 2019
Interpreting Neural Networks Using Flip Points
Interpreting Neural Networks Using Flip Points
Roozbeh Yousefzadeh
D. O’Leary
AAML
FAtt
22
17
0
21 Mar 2019
On the security relevance of weights in deep learning
On the security relevance of weights in deep learning
Kathrin Grosse
T. A. Trost
Marius Mosbach
Michael Backes
Dietrich Klakow
AAML
32
6
0
08 Feb 2019
How to Start Training: The Effect of Initialization and Architecture
How to Start Training: The Effect of Initialization and Architecture
Boris Hanin
David Rolnick
19
253
0
05 Mar 2018
The Loss Surfaces of Multilayer Networks
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
186
1,186
0
30 Nov 2014
1