On the validity of kernel approximations for orthogonally-initialized neural networks

13 April 2021

Papers citing "On the validity of kernel approximations for orthogonally-initialized neural networks"

3 / 3 papers shown

Title
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation Bobby He James Martens Guodong Zhang Aleksandar Botev Andy Brock Samuel L. Smith Yee Whye Teh 85 30 0 20 Feb 2023
Deep equilibrium networks are sensitive to initialization statistics Atish Agarwala S. Schoenholz 93 7 0 19 Jul 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers Guodong Zhang Aleksandar Botev James Martens OffRL 83 28 0 15 Mar 2022