ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.09321
  4. Cited By
Fixup Initialization: Residual Learning Without Normalization

Fixup Initialization: Residual Learning Without Normalization

27 January 2019
Hongyi Zhang
Yann N. Dauphin
Tengyu Ma
    ODL
    AI4CE
ArXivPDFHTML

Papers citing "Fixup Initialization: Residual Learning Without Normalization"

50 / 88 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
44
0
0
02 May 2025
SpINR: Neural Volumetric Reconstruction for FMCW Radars
SpINR: Neural Volumetric Reconstruction for FMCW Radars
Harshvardhan Takawale
Nirupam Roy
32
0
0
30 Mar 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu Zhang
Gaojie Jin
Xianrui Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
39
0
0
24 Feb 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
47
1
0
12 Jan 2025
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Taesun Yeom
Sangyoon Lee
Jaeho Lee
61
2
0
07 Oct 2024
Benchmarking the Attribution Quality of Vision Models
Benchmarking the Attribution Quality of Vision Models
Robin Hesse
Simone Schaub-Meyer
Stefan Roth
FAtt
34
3
0
16 Jul 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
42
3
0
29 May 2024
Understanding the training of infinitely deep and wide ResNets with
  Conditional Optimal Transport
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
Raphael Barboni
Gabriel Peyré
Franccois-Xavier Vialard
37
3
0
19 Mar 2024
Principled Weight Initialization for Hypernetworks
Principled Weight Initialization for Hypernetworks
Oscar Chang
Lampros Flokas
Hod Lipson
27
73
0
13 Dec 2023
Quantitative CLTs in Deep Neural Networks
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
33
12
0
12 Jul 2023
The Disharmony between BN and ReLU Causes Gradient Explosion, but is
  Offset by the Correlation between Activations
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations
Inyoung Paik
Jaesik Choi
21
0
0
23 Apr 2023
The R-mAtrIx Net
The R-mAtrIx Net
Shailesh Lal
Suvajit Majumder
E. Sobko
24
5
0
14 Apr 2023
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks
  with Soft-Thresholding
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Chunyan Xiong
Meng Lu
Xiaotong Yu
JIAN-PENG Cao
Zhong Chen
D. Guo
X. Qu
MLT
40
0
0
14 Apr 2023
QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster
  Inference on Mobile Platforms
QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms
Guillaume Berger
Manik Dhingra
Antoine Mercier
Yash Savani
Sunny Panchal
Fatih Porikli
SupR
20
5
0
08 Mar 2023
Can We Scale Transformers to Predict Parameters of Diverse ImageNet
  Models?
Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Boris Knyazev
Doha Hwang
Simon Lacoste-Julien
AI4CE
37
17
0
07 Mar 2023
Unsupervised Learning of Initialization in Deep Neural Networks via
  Maximum Mean Discrepancy
Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy
Cheolhyoung Lee
Kyunghyun Cho
22
0
0
08 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
Scaling laws for single-agent reinforcement learning
Scaling laws for single-agent reinforcement learning
Jacob Hilton
Jie Tang
John Schulman
22
20
0
31 Jan 2023
Expected Gradients of Maxout Networks and Consequences to Parameter
  Initialization
Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
Hanna Tseran
Guido Montúfar
ODL
30
0
0
17 Jan 2023
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan
Hanie Sedghi
O. Saukh
R. Entezari
Behnam Neyshabur
MoMe
46
94
0
15 Nov 2022
Do Bayesian Neural Networks Need To Be Fully Stochastic?
Do Bayesian Neural Networks Need To Be Fully Stochastic?
Mrinank Sharma
Sebastian Farquhar
Eric T. Nalisnick
Tom Rainforth
BDL
23
52
0
11 Nov 2022
NoMorelization: Building Normalizer-Free Models from a Sample's
  Perspective
NoMorelization: Building Normalizer-Free Models from a Sample's Perspective
Chang-Shu Liu
Yuwen Yang
Yue Ding
Hongtao Lu
34
2
0
13 Oct 2022
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR
  Prediction
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction
Wenlong Deng
Lang Lang
Ziqiang Liu
B. Liu
26
0
0
09 Oct 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
40
2
0
05 Oct 2022
Removing Batch Normalization Boosts Adversarial Training
Removing Batch Normalization Boosts Adversarial Training
Haotao Wang
Aston Zhang
Shuai Zheng
Xingjian Shi
Mu Li
Zhangyang Wang
40
41
0
04 Jul 2022
Cold Posteriors through PAC-Bayes
Cold Posteriors through PAC-Bayes
Konstantinos Pitas
Julyan Arbel
29
5
0
22 Jun 2022
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning
Javier Antorán
David Janz
J. Allingham
Erik A. Daxberger
Riccardo Barbano
Eric T. Nalisnick
José Miguel Hernández-Lobato
UQCV
BDL
30
28
0
17 Jun 2022
Fast Finite Width Neural Tangent Kernel
Fast Finite Width Neural Tangent Kernel
Roman Novak
Jascha Narain Sohl-Dickstein
S. Schoenholz
AAML
28
53
0
17 Jun 2022
Scaling ResNets in the Large-depth Regime
Scaling ResNets in the Large-depth Regime
Pierre Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
26
16
0
14 Jun 2022
Guidelines for the Regularization of Gammas in Batch Normalization for
  Deep Residual Networks
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Dong Gu Lee
Wonseok Jeong
Sang Woo Kim
19
4
0
15 May 2022
Online Convolutional Re-parameterization
Online Convolutional Re-parameterization
Mu Hu
Junyi Feng
Jiashen Hua
Baisheng Lai
Jianqiang Huang
Xiaojin Gong
Xiansheng Hua
24
26
0
02 Apr 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored
  Rectifiers
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
34
26
0
15 Mar 2022
Acceleration of Federated Learning with Alleviated Forgetting in Local
  Training
Acceleration of Federated Learning with Alleviated Forgetting in Local Training
Chencheng Xu
Zhiwei Hong
Minlie Huang
Tao Jiang
FedML
24
46
0
05 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
26
156
0
01 Mar 2022
TrimBERT: Tailoring BERT for Trade-offs
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
23
4
0
24 Feb 2022
Bayesian Model Selection, the Marginal Likelihood, and Generalization
Bayesian Model Selection, the Marginal Likelihood, and Generalization
Sanae Lotfi
Pavel Izmailov
Gregory W. Benton
Micah Goldblum
A. Wilson
UQCV
BDL
52
56
0
23 Feb 2022
Invariance Learning in Deep Neural Networks with Differentiable Laplace
  Approximations
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations
Alexander Immer
Tycho F. A. van der Ouderaa
Gunnar Rätsch
Vincent Fortuin
Mark van der Wilk
BDL
39
44
0
22 Feb 2022
Understanding AdamW through Proximal Methods and Scale-Freeness
Understanding AdamW through Proximal Methods and Scale-Freeness
Zhenxun Zhuang
Mingrui Liu
Ashok Cutkosky
Francesco Orabona
39
63
0
31 Jan 2022
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
Yangsibo Huang
Samyak Gupta
Zhao Song
Kai Li
Sanjeev Arora
FedML
AAML
SILM
28
269
0
30 Nov 2021
Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks
Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks
Ángel López García-Arias
Masanori Hashimoto
Masato Motomura
Jaehoon Yu
39
5
0
24 Nov 2021
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
Ido Nachum
Jan Hkazla
Michael C. Gastpar
Anatoly Khina
36
0
0
03 Nov 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
36
35
0
08 Oct 2021
AdjointBackMapV2: Precise Reconstruction of Arbitrary CNN Unit's
  Activation via Adjoint Operators
AdjointBackMapV2: Precise Reconstruction of Arbitrary CNN Unit's Activation via Adjoint Operators
Qing Wan
Siu Wun Cheung
Yoonsuck Choe
29
0
0
04 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
37
18
0
30 Aug 2021
StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
Yuda Song
Hui Qian
Xin Du
18
47
0
27 Jul 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
  Limit at Initialization
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
Mufan Li
Mihai Nica
Daniel M. Roy
32
33
0
07 Jun 2021
NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results
NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results
Goutam Bhat
Martin Danelljan
Radu Timofte
Kazutoshi Akita
Wooyeong Cho
...
Rao Muhammad Umer
Youliang Yan
Lei Yu
Magauiya Zhussip
X. Zou
SupR
22
39
0
07 Jun 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss
  Landscapes
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas
Juhan Bae
Michael Ruogu Zhang
Stanislav Fort
R. Zemel
Roger C. Grosse
MoMe
164
28
0
22 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
12
Next