ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.09321
  4. Cited By
Fixup Initialization: Residual Learning Without Normalization

Fixup Initialization: Residual Learning Without Normalization

27 January 2019
Hongyi Zhang
Yann N. Dauphin
Tengyu Ma
    ODL
    AI4CE
ArXivPDFHTML

Papers citing "Fixup Initialization: Residual Learning Without Normalization"

45 / 95 papers shown
Title
Free-viewpoint Indoor Neural Relighting from Multi-view Stereo
Free-viewpoint Indoor Neural Relighting from Multi-view Stereo
Julien Philip
Sébastien Morgenthaler
Michael Gharbi
G. Drettakis
3DV
40
51
0
24 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
  Limit at Initialization
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
Mufan Li
Mihai Nica
Daniel M. Roy
32
33
0
07 Jun 2021
NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results
NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results
Goutam Bhat
Martin Danelljan
Radu Timofte
Kazutoshi Akita
Wooyeong Cho
...
Rao Muhammad Umer
Youliang Yan
Lei Yu
Magauiya Zhussip
X. Zou
SupR
22
39
0
07 Jun 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss
  Landscapes
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas
Juhan Bae
Michael Ruogu Zhang
Stanislav Fort
R. Zemel
Roger C. Grosse
MoMe
172
28
0
22 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
27
988
0
31 Mar 2021
Large Batch Simulation for Deep Reinforcement Learning
Large Batch Simulation for Deep Reinforcement Learning
Brennan Shacklett
Erik Wijmans
Aleksei Petrenko
Manolis Savva
Dhruv Batra
V. Koltun
Kayvon Fatahalian
3DV
OffRL
AI4CE
29
26
0
12 Mar 2021
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
Yifan Jiang
Tom Goldstein
ODL
41
53
0
16 Feb 2021
Infinitely Deep Bayesian Neural Networks with Stochastic Differential
  Equations
Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations
Winnie Xu
Ricky T. Q. Chen
Xuechen Li
David Duvenaud
BDL
UQCV
27
46
0
12 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Optimizing Deeper Transformers on Small Datasets
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
24
69
0
30 Dec 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them
  on Images
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
R. Child
BDL
VLM
56
339
0
20 Nov 2020
Nanopore Base Calling on the Edge
Nanopore Base Calling on the Edge
Peter Perešíni
V. Boža
Broňa Brejová
T. Vinař
19
38
0
09 Nov 2020
Stable ResNet
Stable ResNet
Soufiane Hayou
Eugenio Clerico
Bo He
George Deligiannidis
Arnaud Doucet
Judith Rousseau
ODL
SSeg
46
51
0
24 Oct 2020
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
Utku Evci
Yani Andrew Ioannou
Cem Keskin
Yann N. Dauphin
35
87
0
07 Oct 2020
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural
  Network Initialization?
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?
Yaniv Blumenfeld
D. Gilboa
Daniel Soudry
ODL
30
13
0
02 Jul 2020
Deep Isometric Learning for Visual Recognition
Deep Isometric Learning for Visual Recognition
Haozhi Qi
Chong You
Xueliang Wang
Yi Ma
Jitendra Malik
VLM
35
54
0
30 Jun 2020
Improving robustness against common corruptions by covariate shift
  adaptation
Improving robustness against common corruptions by covariate shift adaptation
Steffen Schneider
E. Rusak
L. Eck
Oliver Bringmann
Wieland Brendel
Matthias Bethge
VLM
42
463
0
30 Jun 2020
Evaluating Prediction-Time Batch Normalization for Robustness under
  Covariate Shift
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift
Zachary Nado
Shreyas Padhy
D. Sculley
Alexander DÁmour
Balaji Lakshminarayanan
Jasper Snoek
OOD
AI4TS
34
240
0
19 Jun 2020
Neural Networks and Value at Risk
Neural Networks and Value at Risk
Alexander Arimond
Damian Borth
Andreas G. F. Hoepner
M. Klawunn
S. Weisheit
8
8
0
04 May 2020
Jukebox: A Generative Model for Music
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
52
722
0
30 Apr 2020
Evolving Normalization-Activation Layers
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
19
79
0
06 Apr 2020
Pipelined Backpropagation at Scale: Training Large Models without
  Batches
Pipelined Backpropagation at Scale: Training Large Models without Batches
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
35
33
0
25 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
24
276
0
10 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function
  in Deep Networks
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
27
20
0
24 Feb 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
26
949
0
12 Feb 2020
A Comprehensive and Modularized Statistical Framework for Gradient Norm
  Equality in Deep Neural Networks
A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks
Zhaodong Chen
Lei Deng
Bangyan Wang
Guoqi Li
Yuan Xie
35
28
0
01 Jan 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit
  Selection
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
22
108
0
25 Dec 2019
Towards Efficient Training for Neural Network Quantization
Towards Efficient Training for Neural Network Quantization
Qing Jin
Linjie Yang
Zhenyu A. Liao
MQ
19
42
0
21 Dec 2019
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
27
168
0
19 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
49
395
0
11 Dec 2019
Understanding and Improving Layer Normalization
Understanding and Improving Layer Normalization
Jingjing Xu
Xu Sun
Zhiyuan Zhang
Guangxiang Zhao
Junyang Lin
FAtt
32
342
0
16 Nov 2019
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Lei Wu
Qingcan Wang
Chao Ma
ODL
AI4CE
28
22
0
02 Nov 2019
An Adaptive and Momental Bound Method for Stochastic Learning
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
19
46
0
27 Oct 2019
Transformers without Tears: Improving the Normalization of
  Self-Attention
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
38
224
0
14 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
43
584
0
25 Sep 2019
Understanding the Effects of Pre-Training for Object Detectors via
  Eigenspectrum
Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum
Yosuke Shinya
E. Simo-Serra
Taiji Suzuki
19
12
0
09 Sep 2019
Attentive Normalization
Attentive Normalization
Xilai Li
Wei Sun
Tianfu Wu
OOD
ViT
28
31
0
04 Aug 2019
AutoML: A Survey of the State-of-the-Art
AutoML: A Survey of the State-of-the-Art
Xin He
Kaiyong Zhao
Xiaowen Chu
22
1,423
0
02 Aug 2019
Multi-Scale Learned Iterative Reconstruction
Multi-Scale Learned Iterative Reconstruction
A. Hauptmann
J. Adler
Simon Arridge
Ozan Oktem
36
37
0
01 Aug 2019
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Kaifeng Lyu
Jian Li
52
324
0
13 Jun 2019
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for
  Regression Problems
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Tianle Cai
Ruiqi Gao
Jikai Hou
Siyu Chen
Dong Wang
Di He
Zhihua Zhang
Liwei Wang
ODL
21
57
0
28 May 2019
Universal Sound Separation
Universal Sound Separation
Ilya Kavalerov
Scott Wisdom
Hakan Erdogan
Brian Patton
K. Wilson
Jonathan Le Roux
J. Hershey
11
184
0
08 May 2019
Gradient-Coherent Strong Regularization for Deep Neural Networks
Gradient-Coherent Strong Regularization for Deep Neural Networks
Dae Hoon Park
C. Ho
Yi Chang
Huaqing Zhang
ODL
21
1
0
20 Nov 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train
  10,000-Layer Vanilla Convolutional Neural Networks
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
244
349
0
14 Jun 2018
Previous
12