$$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space$

$\mathcal{G}$ -SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

11 February 2018

Papers citing "$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space"

7 / 7 papers shown

Title
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion Binchi Zhang Zaiyi Zheng Zhengzhang Chen Wenlin Yao 99 0 0 01 Feb 2025
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms Han Xiao Kashif Rasul Roland Vollgraf 86 8,807 0 25 Aug 2017
The Shattered Gradients Problem: If resnets are the answer, then what is the question? David Balduzzi Marcus Frean Lennox Leary J. P. Lewis Kurt Wan-Duo Ma Brian McWilliams ODL 44 399 0 28 Feb 2017
Recurrent Neural Networks With Limited Numerical Precision Joachim Ott Zhouhan Lin Yanzhe Zhang Shih-Chii Liu Yoshua Bengio MQ 51 77 0 24 Aug 2016
Identity Mappings in Deep Residual Networks Kaiming He Xinming Zhang Shaoqing Ren Jian Sun 240 10,149 0 16 Mar 2016
ADADELTA: An Adaptive Learning Rate Method Matthew D. Zeiler ODL 78 6,619 0 22 Dec 2012
Improving neural networks by preventing co-adaptation of feature detectors Geoffrey E. Hinton Nitish Srivastava A. Krizhevsky Ilya Sutskever Ruslan Salakhutdinov VLM 338 7,650 0 03 Jul 2012