ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.01719
  4. Cited By
How to Start Training: The Effect of Initialization and Architecture

How to Start Training: The Effect of Initialization and Architecture

5 March 2018
Boris Hanin
David Rolnick
ArXivPDFHTML

Papers citing "How to Start Training: The Effect of Initialization and Architecture"

50 / 59 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
44
1
0
02 May 2025
Deep Neural Nets as Hamiltonians
Deep Neural Nets as Hamiltonians
Mike Winer
Boris Hanin
187
0
0
31 Mar 2025
Joint Segmentation and Image Reconstruction with Error Prediction in
  Photoacoustic Imaging using Deep Learning
Joint Segmentation and Image Reconstruction with Error Prediction in Photoacoustic Imaging using Deep Learning
Ruibo Shang
Geoffrey P. Luke
Matthew O'Donnell
UQCV
37
0
0
02 Jul 2024
Quantitative CLTs in Deep Neural Networks
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
33
12
0
12 Jul 2023
Variational Latent Branching Model for Off-Policy Evaluation
Variational Latent Branching Model for Off-Policy Evaluation
Qitong Gao
Ge Gao
Min Chi
Miroslav Pajic
OffRL
36
6
0
28 Jan 2023
Expected Gradients of Maxout Networks and Consequences to Parameter
  Initialization
Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
Hanna Tseran
Guido Montúfar
ODL
30
0
0
17 Jan 2023
Accelerating Dataset Distillation via Model Augmentation
Accelerating Dataset Distillation via Model Augmentation
Lei Zhang
Jie M. Zhang
Bowen Lei
Subhabrata Mukherjee
Xiang Pan
Bo Zhao
Caiwen Ding
Heng Chang
Dongkuan Xu
DD
47
62
0
12 Dec 2022
Unifying Tracking and Image-Video Object Detection
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
28
0
0
20 Nov 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
40
2
0
05 Oct 2022
Model Zoos: A Dataset of Diverse Populations of Neural Network Models
Model Zoos: A Dataset of Diverse Populations of Neural Network Models
Konstantin Schurholt
Diyar Taskiran
Boris Knyazev
Xavier Giró-i-Nieto
Damian Borth
60
29
0
29 Sep 2022
Improving Fine-tuning of Self-supervised Models with Contrastive
  Initialization
Improving Fine-tuning of Self-supervised Models with Contrastive Initialization
Haolin Pan
Yong Guo
Qinyi Deng
Hao-Fan Yang
Yiqun Chen
Jian Chen
SSL
23
19
0
30 Jul 2022
Scaling ResNets in the Large-depth Regime
Scaling ResNets in the Large-depth Regime
Pierre Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
26
16
0
14 Jun 2022
Analysis of Diffractive Neural Networks for Seeing Through Random
  Diffusers
Analysis of Diffractive Neural Networks for Seeing Through Random Diffusers
Yuhang Li
Yilin Luo
Bijie Bai
Aydogan Ozcan
DiffM
27
11
0
01 May 2022
Vision-Based American Sign Language Classification Approach via Deep
  Learning
Vision-Based American Sign Language Classification Approach via Deep Learning
Nelly Elsayed
Zag ElSayed
Anthony Maida
VLM
19
3
0
08 Apr 2022
Solving ImageNet: a Unified Scheme for Training any Backbone to Top
  Results
Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results
T. Ridnik
Hussam Lawen
Emanuel Ben-Baruch
Asaf Noy
40
11
0
07 Apr 2022
Modified DDPG car-following model with a real-world human driving
  experience with CARLA simulator
Modified DDPG car-following model with a real-world human driving experience with CARLA simulator
Dian-Tao Li
Ostap Okhrin
38
37
0
29 Dec 2021
SeMask: Semantically Masked Transformers for Semantic Segmentation
SeMask: Semantically Masked Transformers for Semantic Segmentation
Jitesh Jain
Anukriti Singh
Nikita Orlov
Zilong Huang
Jiachen Li
Steven Walton
Humphrey Shi
ViT
29
93
0
23 Dec 2021
StyleSwin: Transformer-based GAN for High-resolution Image Generation
StyleSwin: Transformer-based GAN for High-resolution Image Generation
Bo Zhang
Shuyang Gu
Bo Zhang
Jianmin Bao
Dong Chen
Fang Wen
Yong Wang
B. Guo
ViT
38
223
0
20 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
75
678
0
02 Dec 2021
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
Ido Nachum
Jan Hkazla
Michael C. Gastpar
Anatoly Khina
36
0
0
03 Nov 2021
Hyper-Representations: Self-Supervised Representation Learning on Neural
  Network Weights for Model Characteristic Prediction
Hyper-Representations: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction
Konstantin Schurholt
Dimche Kostadinov
Damian Borth
SSL
41
14
0
28 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
Dynamic Neural Network Architectural and Topological Adaptation and
  Related Methods -- A Survey
Dynamic Neural Network Architectural and Topological Adaptation and Related Methods -- A Survey
Lorenz Kummer
AI4CE
40
0
0
28 Jul 2021
Deep-learning-driven Reliable Single-pixel Imaging with Uncertainty
  Approximation
Deep-learning-driven Reliable Single-pixel Imaging with Uncertainty Approximation
Ruibo Shang
Mikaela A. O’Brien
Geoffrey P. Luke
UQCV
BDL
36
2
0
24 Jul 2021
Precise characterization of the prior predictive distribution of deep
  ReLU networks
Precise characterization of the prior predictive distribution of deep ReLU networks
Lorenzo Noci
Gregor Bachmann
Kevin Roth
Sebastian Nowozin
Thomas Hofmann
BDL
UQCV
29
32
0
11 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
  Limit at Initialization
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
Mufan Li
Mihai Nica
Daniel M. Roy
32
33
0
07 Jun 2021
Activation function design for deep networks: linearity and effective
  initialisation
Activation function design for deep networks: linearity and effective initialisation
Michael Murray
V. Abrol
Jared Tanner
ODL
LLMSV
29
18
0
17 May 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
Sharp bounds for the number of regions of maxout networks and vertices
  of Minkowski sums
Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums
Guido Montúfar
Yue Ren
Leon Zhang
20
39
0
16 Apr 2021
A proof of convergence for stochastic gradient descent in the training
  of artificial neural networks with ReLU activation for constant target
  functions
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Arnulf Jentzen
Adrian Riekert
MLT
34
13
0
01 Apr 2021
Proof-of-Learning: Definitions and Practice
Proof-of-Learning: Definitions and Practice
Hengrui Jia
Mohammad Yaghini
Christopher A. Choquette-Choo
Natalie Dullerud
Anvith Thudi
Varun Chandrasekaran
Nicolas Papernot
AAML
25
99
0
09 Mar 2021
Convergence rates for gradient descent in the training of
  overparameterized artificial neural networks with biases
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
Arnulf Jentzen
T. Kröger
ODL
28
7
0
23 Feb 2021
Deep ReLU Networks Preserve Expected Length
Deep ReLU Networks Preserve Expected Length
Boris Hanin
Ryan Jeong
David Rolnick
29
14
0
21 Feb 2021
A proof of convergence for gradient descent in the training of
  artificial neural networks for constant target functions
A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
Patrick Cheridito
Arnulf Jentzen
Adrian Riekert
Florian Rossmannek
28
24
0
19 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Where Should We Begin? A Low-Level Exploration of Weight Initialization
  Impact on Quantized Behaviour of Deep Neural Networks
Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks
S. Yun
A. Wong
MQ
9
4
0
30 Nov 2020
SWIPENET: Object detection in noisy underwater images
SWIPENET: Object detection in noisy underwater images
Long Chen
Feixiang Zhou
Shengke Wang
Junyu Dong
Ning Li
Haiping Ma
Xin Wang
Huiyu Zhou
18
17
0
19 Oct 2020
Towards a Mathematical Understanding of Neural Network-Based Machine
  Learning: what we know and what we don't
Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't
E. Weinan
Chao Ma
Stephan Wojtowytsch
Lei Wu
AI4CE
22
133
0
22 Sep 2020
Tensor Programs III: Neural Matrix Laws
Tensor Programs III: Neural Matrix Laws
Greg Yang
14
44
0
22 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
36
79
0
17 Sep 2020
Deep Isometric Learning for Visual Recognition
Deep Isometric Learning for Visual Recognition
Haozhi Qi
Chong You
Xueliang Wang
Yi Ma
Jitendra Malik
VLM
35
54
0
30 Jun 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Greg Yang
58
135
0
25 Jun 2020
Non-convergence of stochastic gradient descent in the training of deep
  neural networks
Non-convergence of stochastic gradient descent in the training of deep neural networks
Patrick Cheridito
Arnulf Jentzen
Florian Rossmannek
14
37
0
12 Jun 2020
Overall error analysis for the training of deep neural networks via
  stochastic gradient descent with random initialisation
Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
Arnulf Jentzen
Timo Welti
22
15
0
03 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function
  in Deep Networks
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
27
20
0
24 Feb 2020
LaProp: Separating Momentum and Adaptivity in Adam
LaProp: Separating Momentum and Adaptivity in Adam
Liu Ziyin
Zhikang T.Wang
Masahito Ueda
ODL
13
18
0
12 Feb 2020
Salvaging Federated Learning by Local Adaptation
Salvaging Federated Learning by Local Adaptation
Tao Yu
Eugene Bagdasaryan
Vitaly Shmatikov
FedML
25
260
0
12 Feb 2020
Multilevel Initialization for Layer-Parallel Deep Neural Network
  Training
Multilevel Initialization for Layer-Parallel Deep Neural Network Training
E. Cyr
Stefanie Günther
J. Schroder
AI4CE
22
11
0
19 Dec 2019
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
27
168
0
19 Dec 2019
12
Next