Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.01719
Cited By
How to Start Training: The Effect of Initialization and Architecture
5 March 2018
Boris Hanin
David Rolnick
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to Start Training: The Effect of Initialization and Architecture"
50 / 59 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
44
1
0
02 May 2025
Deep Neural Nets as Hamiltonians
Mike Winer
Boris Hanin
187
0
0
31 Mar 2025
Joint Segmentation and Image Reconstruction with Error Prediction in Photoacoustic Imaging using Deep Learning
Ruibo Shang
Geoffrey P. Luke
Matthew O'Donnell
UQCV
37
0
0
02 Jul 2024
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
33
12
0
12 Jul 2023
Variational Latent Branching Model for Off-Policy Evaluation
Qitong Gao
Ge Gao
Min Chi
Miroslav Pajic
OffRL
36
6
0
28 Jan 2023
Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
Hanna Tseran
Guido Montúfar
ODL
30
0
0
17 Jan 2023
Accelerating Dataset Distillation via Model Augmentation
Lei Zhang
Jie M. Zhang
Bowen Lei
Subhabrata Mukherjee
Xiang Pan
Bo Zhao
Caiwen Ding
Heng Chang
Dongkuan Xu
DD
47
62
0
12 Dec 2022
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
28
0
0
20 Nov 2022
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
40
2
0
05 Oct 2022
Model Zoos: A Dataset of Diverse Populations of Neural Network Models
Konstantin Schurholt
Diyar Taskiran
Boris Knyazev
Xavier Giró-i-Nieto
Damian Borth
60
29
0
29 Sep 2022
Improving Fine-tuning of Self-supervised Models with Contrastive Initialization
Haolin Pan
Yong Guo
Qinyi Deng
Hao-Fan Yang
Yiqun Chen
Jian Chen
SSL
23
19
0
30 Jul 2022
Scaling ResNets in the Large-depth Regime
Pierre Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
26
16
0
14 Jun 2022
Analysis of Diffractive Neural Networks for Seeing Through Random Diffusers
Yuhang Li
Yilin Luo
Bijie Bai
Aydogan Ozcan
DiffM
27
11
0
01 May 2022
Vision-Based American Sign Language Classification Approach via Deep Learning
Nelly Elsayed
Zag ElSayed
Anthony Maida
VLM
19
3
0
08 Apr 2022
Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results
T. Ridnik
Hussam Lawen
Emanuel Ben-Baruch
Asaf Noy
40
11
0
07 Apr 2022
Modified DDPG car-following model with a real-world human driving experience with CARLA simulator
Dian-Tao Li
Ostap Okhrin
38
37
0
29 Dec 2021
SeMask: Semantically Masked Transformers for Semantic Segmentation
Jitesh Jain
Anukriti Singh
Nikita Orlov
Zilong Huang
Jiachen Li
Steven Walton
Humphrey Shi
ViT
29
93
0
23 Dec 2021
StyleSwin: Transformer-based GAN for High-resolution Image Generation
Bo Zhang
Shuyang Gu
Bo Zhang
Jianmin Bao
Dong Chen
Fang Wen
Yong Wang
B. Guo
ViT
38
223
0
20 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
75
678
0
02 Dec 2021
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
Ido Nachum
Jan Hkazla
Michael C. Gastpar
Anatoly Khina
36
0
0
03 Nov 2021
Hyper-Representations: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction
Konstantin Schurholt
Dimche Kostadinov
Damian Borth
SSL
41
14
0
28 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
Dynamic Neural Network Architectural and Topological Adaptation and Related Methods -- A Survey
Lorenz Kummer
AI4CE
40
0
0
28 Jul 2021
Deep-learning-driven Reliable Single-pixel Imaging with Uncertainty Approximation
Ruibo Shang
Mikaela A. O’Brien
Geoffrey P. Luke
UQCV
BDL
36
2
0
24 Jul 2021
Precise characterization of the prior predictive distribution of deep ReLU networks
Lorenzo Noci
Gregor Bachmann
Kevin Roth
Sebastian Nowozin
Thomas Hofmann
BDL
UQCV
29
32
0
11 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
Mufan Li
Mihai Nica
Daniel M. Roy
32
33
0
07 Jun 2021
Activation function design for deep networks: linearity and effective initialisation
Michael Murray
V. Abrol
Jared Tanner
ODL
LLMSV
29
18
0
17 May 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums
Guido Montúfar
Yue Ren
Leon Zhang
20
39
0
16 Apr 2021
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Arnulf Jentzen
Adrian Riekert
MLT
34
13
0
01 Apr 2021
Proof-of-Learning: Definitions and Practice
Hengrui Jia
Mohammad Yaghini
Christopher A. Choquette-Choo
Natalie Dullerud
Anvith Thudi
Varun Chandrasekaran
Nicolas Papernot
AAML
25
99
0
09 Mar 2021
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
Arnulf Jentzen
T. Kröger
ODL
28
7
0
23 Feb 2021
Deep ReLU Networks Preserve Expected Length
Boris Hanin
Ryan Jeong
David Rolnick
29
14
0
21 Feb 2021
A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
Patrick Cheridito
Arnulf Jentzen
Adrian Riekert
Florian Rossmannek
28
24
0
19 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks
S. Yun
A. Wong
MQ
9
4
0
30 Nov 2020
SWIPENET: Object detection in noisy underwater images
Long Chen
Feixiang Zhou
Shengke Wang
Junyu Dong
Ning Li
Haiping Ma
Xin Wang
Huiyu Zhou
18
17
0
19 Oct 2020
Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't
E. Weinan
Chao Ma
Stephan Wojtowytsch
Lei Wu
AI4CE
22
133
0
22 Sep 2020
Tensor Programs III: Neural Matrix Laws
Greg Yang
14
44
0
22 Sep 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
36
79
0
17 Sep 2020
Deep Isometric Learning for Visual Recognition
Haozhi Qi
Chong You
Xueliang Wang
Yi Ma
Jitendra Malik
VLM
35
54
0
30 Jun 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Greg Yang
58
135
0
25 Jun 2020
Non-convergence of stochastic gradient descent in the training of deep neural networks
Patrick Cheridito
Arnulf Jentzen
Florian Rossmannek
14
37
0
12 Jun 2020
Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
Arnulf Jentzen
Timo Welti
22
15
0
03 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
27
20
0
24 Feb 2020
LaProp: Separating Momentum and Adaptivity in Adam
Liu Ziyin
Zhikang T.Wang
Masahito Ueda
ODL
13
18
0
12 Feb 2020
Salvaging Federated Learning by Local Adaptation
Tao Yu
Eugene Bagdasaryan
Vitaly Shmatikov
FedML
25
260
0
12 Feb 2020
Multilevel Initialization for Layer-Parallel Deep Neural Network Training
E. Cyr
Stefanie Günther
J. Schroder
AI4CE
22
11
0
19 Dec 2019
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
27
168
0
19 Dec 2019
1
2
Next