Fixup Initialization: Residual Learning Without Normalization

27 January 2019

Papers citing "Fixup Initialization: Residual Learning Without Normalization"

45 / 95 papers shown

Title
Free-viewpoint Indoor Neural Relighting from Multi-view Stereo Julien Philip Sébastien Morgenthaler Michael Gharbi G. Drettakis 3DV 40 51 0 24 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization Mufan Li Mihai Nica Daniel M. Roy 32 33 0 07 Jun 2021
NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results Goutam Bhat Martin Danelljan Radu Timofte Kazutoshi Akita Wooyeong Cho ... Rao Muhammad Umer Youliang Yan Lei Yu Magauiya Zhussip X. Zou SupR 22 39 0 07 Jun 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes James Lucas Juhan Bae Michael Ruogu Zhang Stanislav Fort R. Zemel Roger C. Grosse MoMe 172 28 0 22 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization Tianlong Chen Zhenyu Zhang Xu Ouyang Zechun Liu Zhiqiang Shen Zhangyang Wang MQ 43 36 0 16 Apr 2021
Going deeper with Image Transformers Hugo Touvron Matthieu Cord Alexandre Sablayrolles Gabriel Synnaeve Hervé Jégou ViT 27 988 0 31 Mar 2021
Large Batch Simulation for Deep Reinforcement Learning Brennan Shacklett Erik Wijmans Aleksei Petrenko Manolis Savva Dhruv Batra V. Koltun Kayvon Fatahalian 3DV OffRL AI4CE 29 26 0 12 Mar 2021
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training Chen Zhu Renkun Ni Zheng Xu Kezhi Kong Yifan Jiang Tom Goldstein ODL 41 53 0 16 Feb 2021
Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations Winnie Xu Ricky T. Q. Chen Xuechen Li David Duvenaud BDL UQCV 27 46 0 12 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization Andrew Brock Soham De Samuel L. Smith Karen Simonyan VLM 223 512 0 11 Feb 2021
Optimizing Deeper Transformers on Small Datasets Peng Xu Dhruv Kumar Wei Yang Wenjie Zi Keyi Tang Chenyang Huang Jackie C.K. Cheung S. Prince Yanshuai Cao AI4CE 24 69 0 30 Dec 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images R. Child BDL VLM 56 339 0 20 Nov 2020
Nanopore Base Calling on the Edge Peter Perešíni V. Boža Broňa Brejová T. Vinař 19 38 0 09 Nov 2020
Stable ResNet Soufiane Hayou Eugenio Clerico Bo He George Deligiannidis Arnaud Doucet Judith Rousseau ODL SSeg 46 51 0 24 Oct 2020
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win Utku Evci Yani Andrew Ioannou Cem Keskin Yann N. Dauphin 35 87 0 07 Oct 2020
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization? Yaniv Blumenfeld D. Gilboa Daniel Soudry ODL 30 13 0 02 Jul 2020
Deep Isometric Learning for Visual Recognition Haozhi Qi Chong You Xueliang Wang Yi Ma Jitendra Malik VLM 35 54 0 30 Jun 2020
Improving robustness against common corruptions by covariate shift adaptation Steffen Schneider E. Rusak L. Eck Oliver Bringmann Wieland Brendel Matthias Bethge VLM 42 463 0 30 Jun 2020
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift Zachary Nado Shreyas Padhy D. Sculley Alexander DÁmour Balaji Lakshminarayanan Jasper Snoek OOD AI4TS 34 240 0 19 Jun 2020
Neural Networks and Value at Risk Alexander Arimond Damian Borth Andreas G. F. Hoepner M. Klawunn S. Weisheit 8 8 0 04 May 2020
Jukebox: A Generative Model for Music Prafulla Dhariwal Heewoo Jun Christine Payne Jong Wook Kim Alec Radford Ilya Sutskever VLM 52 722 0 30 Apr 2020
Evolving Normalization-Activation Layers Hanxiao Liu Andrew Brock Karen Simonyan Quoc V. Le 19 79 0 06 Apr 2020
Pipelined Backpropagation at Scale: Training Large Models without Batches Atli Kosson Vitaliy Chiley Abhinav Venigalla Joel Hestness Urs Koster 35 33 0 25 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth Thomas C. Bachlechner Bodhisattwa Prasad Majumder H. H. Mao G. Cottrell Julian McAuley AI4CE 24 276 0 10 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Soham De Samuel L. Smith ODL 27 20 0 24 Feb 2020
On Layer Normalization in the Transformer Architecture Ruibin Xiong Yunchang Yang Di He Kai Zheng Shuxin Zheng Chen Xing Huishuai Zhang Yanyan Lan Liwei Wang Tie-Yan Liu AI4CE 26 949 0 12 Feb 2020
A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks Zhaodong Chen Lei Deng Bangyan Wang Guoqi Li Yuan Xie 35 28 0 01 Jan 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection Guangxiang Zhao Junyang Lin Zhiyuan Zhang Xuancheng Ren Qi Su Xu Sun 22 108 0 25 Dec 2019
Towards Efficient Training for Neural Network Quantization Qing Jin Linjie Yang Zhenyu A. Liao MQ 19 42 0 21 Dec 2019
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 27 168 0 19 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French Hang Le Loïc Vial Jibril Frej Vincent Segonne Maximin Coavoux Benjamin Lecouteux A. Allauzen Benoît Crabbé Laurent Besacier D. Schwab AI4CE 49 395 0 11 Dec 2019
Understanding and Improving Layer Normalization Jingjing Xu Xu Sun Zhiyuan Zhang Guangxiang Zhao Junyang Lin FAtt 32 342 0 16 Nov 2019
Global Convergence of Gradient Descent for Deep Linear Residual Networks Lei Wu Qingcan Wang Chao Ma ODL AI4CE 28 22 0 02 Nov 2019
An Adaptive and Momental Bound Method for Stochastic Learning Jianbang Ding Xuancheng Ren Ruixuan Luo Xu Sun ODL 19 46 0 27 Oct 2019
Transformers without Tears: Improving the Normalization of Self-Attention Toan Q. Nguyen Julian Salazar 38 224 0 14 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout Angela Fan Edouard Grave Armand Joulin 43 584 0 25 Sep 2019
Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum Yosuke Shinya E. Simo-Serra Taiji Suzuki 19 12 0 09 Sep 2019
Attentive Normalization Xilai Li Wei Sun Tianfu Wu OOD ViT 28 31 0 04 Aug 2019
AutoML: A Survey of the State-of-the-Art Xin He Kaiyong Zhao Xiaowen Chu 22 1,423 0 02 Aug 2019
Multi-Scale Learned Iterative Reconstruction A. Hauptmann J. Adler Simon Arridge Ozan Oktem 36 37 0 01 Aug 2019
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks Kaifeng Lyu Jian Li 52 324 0 13 Jun 2019
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems Tianle Cai Ruiqi Gao Jikai Hou Siyu Chen Dong Wang Di He Zhihua Zhang Liwei Wang ODL 21 57 0 28 May 2019
Universal Sound Separation Ilya Kavalerov Scott Wisdom Hakan Erdogan Brian Patton K. Wilson Jonathan Le Roux J. Hershey 11 184 0 08 May 2019
Gradient-Coherent Strong Regularization for Deep Neural Networks Dae Hoon Park C. Ho Yi Chang Huaqing Zhang ODL 21 1 0 20 Nov 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks Lechao Xiao Yasaman Bahri Jascha Narain Sohl-Dickstein S. Schoenholz Jeffrey Pennington 244 349 0 14 Jun 2018