Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.09321
Cited By
Fixup Initialization: Residual Learning Without Normalization
27 January 2019
Hongyi Zhang
Yann N. Dauphin
Tengyu Ma
ODL
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fixup Initialization: Residual Learning Without Normalization"
45 / 95 papers shown
Title
Free-viewpoint Indoor Neural Relighting from Multi-view Stereo
Julien Philip
Sébastien Morgenthaler
Michael Gharbi
G. Drettakis
3DV
40
51
0
24 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
Mufan Li
Mihai Nica
Daniel M. Roy
32
33
0
07 Jun 2021
NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results
Goutam Bhat
Martin Danelljan
Radu Timofte
Kazutoshi Akita
Wooyeong Cho
...
Rao Muhammad Umer
Youliang Yan
Lei Yu
Magauiya Zhussip
X. Zou
SupR
22
39
0
07 Jun 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas
Juhan Bae
Michael Ruogu Zhang
Stanislav Fort
R. Zemel
Roger C. Grosse
MoMe
172
28
0
22 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
27
988
0
31 Mar 2021
Large Batch Simulation for Deep Reinforcement Learning
Brennan Shacklett
Erik Wijmans
Aleksei Petrenko
Manolis Savva
Dhruv Batra
V. Koltun
Kayvon Fatahalian
3DV
OffRL
AI4CE
29
26
0
12 Mar 2021
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
Yifan Jiang
Tom Goldstein
ODL
41
53
0
16 Feb 2021
Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations
Winnie Xu
Ricky T. Q. Chen
Xuechen Li
David Duvenaud
BDL
UQCV
27
46
0
12 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
24
69
0
30 Dec 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
R. Child
BDL
VLM
56
339
0
20 Nov 2020
Nanopore Base Calling on the Edge
Peter Perešíni
V. Boža
Broňa Brejová
T. Vinař
19
38
0
09 Nov 2020
Stable ResNet
Soufiane Hayou
Eugenio Clerico
Bo He
George Deligiannidis
Arnaud Doucet
Judith Rousseau
ODL
SSeg
46
51
0
24 Oct 2020
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
Utku Evci
Yani Andrew Ioannou
Cem Keskin
Yann N. Dauphin
35
87
0
07 Oct 2020
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?
Yaniv Blumenfeld
D. Gilboa
Daniel Soudry
ODL
30
13
0
02 Jul 2020
Deep Isometric Learning for Visual Recognition
Haozhi Qi
Chong You
Xueliang Wang
Yi Ma
Jitendra Malik
VLM
35
54
0
30 Jun 2020
Improving robustness against common corruptions by covariate shift adaptation
Steffen Schneider
E. Rusak
L. Eck
Oliver Bringmann
Wieland Brendel
Matthias Bethge
VLM
42
463
0
30 Jun 2020
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift
Zachary Nado
Shreyas Padhy
D. Sculley
Alexander DÁmour
Balaji Lakshminarayanan
Jasper Snoek
OOD
AI4TS
34
240
0
19 Jun 2020
Neural Networks and Value at Risk
Alexander Arimond
Damian Borth
Andreas G. F. Hoepner
M. Klawunn
S. Weisheit
8
8
0
04 May 2020
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
52
722
0
30 Apr 2020
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
19
79
0
06 Apr 2020
Pipelined Backpropagation at Scale: Training Large Models without Batches
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
35
33
0
25 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
24
276
0
10 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
27
20
0
24 Feb 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
26
949
0
12 Feb 2020
A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks
Zhaodong Chen
Lei Deng
Bangyan Wang
Guoqi Li
Yuan Xie
35
28
0
01 Jan 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
22
108
0
25 Dec 2019
Towards Efficient Training for Neural Network Quantization
Qing Jin
Linjie Yang
Zhenyu A. Liao
MQ
19
42
0
21 Dec 2019
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
27
168
0
19 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
49
395
0
11 Dec 2019
Understanding and Improving Layer Normalization
Jingjing Xu
Xu Sun
Zhiyuan Zhang
Guangxiang Zhao
Junyang Lin
FAtt
32
342
0
16 Nov 2019
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Lei Wu
Qingcan Wang
Chao Ma
ODL
AI4CE
28
22
0
02 Nov 2019
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
19
46
0
27 Oct 2019
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
38
224
0
14 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
43
584
0
25 Sep 2019
Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum
Yosuke Shinya
E. Simo-Serra
Taiji Suzuki
19
12
0
09 Sep 2019
Attentive Normalization
Xilai Li
Wei Sun
Tianfu Wu
OOD
ViT
28
31
0
04 Aug 2019
AutoML: A Survey of the State-of-the-Art
Xin He
Kaiyong Zhao
Xiaowen Chu
22
1,423
0
02 Aug 2019
Multi-Scale Learned Iterative Reconstruction
A. Hauptmann
J. Adler
Simon Arridge
Ozan Oktem
36
37
0
01 Aug 2019
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Kaifeng Lyu
Jian Li
52
324
0
13 Jun 2019
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Tianle Cai
Ruiqi Gao
Jikai Hou
Siyu Chen
Dong Wang
Di He
Zhihua Zhang
Liwei Wang
ODL
21
57
0
28 May 2019
Universal Sound Separation
Ilya Kavalerov
Scott Wisdom
Hakan Erdogan
Brian Patton
K. Wilson
Jonathan Le Roux
J. Hershey
11
184
0
08 May 2019
Gradient-Coherent Strong Regularization for Deep Neural Networks
Dae Hoon Park
C. Ho
Yi Chang
Huaqing Zhang
ODL
21
1
0
20 Nov 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
244
349
0
14 Jun 2018
Previous
1
2