Temperature check: theory and practice for training models with softmax-cross-entropy losses

14 October 2020

Papers citing "Temperature check: theory and practice for training models with softmax-cross-entropy losses"

27 / 27 papers shown

Title
Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights Mostafa Mohaimen Akand Faisal Rabeya Amin Jhuma 43 0 0 12 May 2025
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer Greg Yang J. E. Hu Igor Babuschkin Szymon Sidor Xiaodong Liu David Farhi Nick Ryder J. Pachocki Weizhu Chen Jianfeng Gao 62 155 0 07 Mar 2022
Finite Versus Infinite Neural Networks: an Empirical Study Jaehoon Lee S. Schoenholz Jeffrey Pennington Ben Adlam Lechao Xiao Roman Novak Jascha Narain Sohl-Dickstein 45 211 0 31 Jul 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 176 238 0 04 Mar 2020
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss Lénaïc Chizat Francis R. Bach MLT 56 332 0 11 Feb 2020
Disentangling Trainability and Generalization in Deep Neural Networks Lechao Xiao Jeffrey Pennington S. Schoenholz 29 34 0 30 Dec 2019
Neural Tangents: Fast and Easy Infinite Neural Networks in Python Roman Novak Lechao Xiao Jiri Hron Jaehoon Lee Alexander A. Alemi Jascha Narain Sohl-Dickstein S. Schoenholz 46 227 0 05 Dec 2019
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes Greg Yang 71 197 0 28 Oct 2019
Why bigger is not always better: on finite and infinite neural networks Laurence Aitchison 194 53 0 17 Oct 2019
Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics Niru Maheswaranathan Alex H. Williams Matthew D. Golub Surya Ganguli David Sussillo 40 78 0 25 Jun 2019
When Does Label Smoothing Help? Rafael Müller Simon Kornblith Geoffrey E. Hinton UQCV 125 1,931 0 06 Jun 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent Jaehoon Lee Lechao Xiao S. Schoenholz Yasaman Bahri Roman Novak Jascha Narain Sohl-Dickstein Jeffrey Pennington 122 1,089 0 18 Feb 2019
On Lazy Training in Differentiable Programming Lénaïc Chizat Edouard Oyallon Francis R. Bach 82 823 0 19 Dec 2018
A Convergence Theory for Deep Learning via Over-Parameterization Zeyuan Allen-Zhu Yuanzhi Li Zhao Song AI4CE ODL 170 1,457 0 09 Nov 2018
DropBlock: A regularization method for convolutional networks Golnaz Ghiasi Nayeon Lee Quoc V. Le 86 911 0 30 Oct 2018
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak Lechao Xiao Jaehoon Lee Yasaman Bahri Greg Yang Jiri Hron Daniel A. Abolafia Jeffrey Pennington Jascha Narain Sohl-Dickstein UQCV BDL 42 308 0 11 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks S. Du Xiyu Zhai Barnabás Póczós Aarti Singh MLT ODL 130 1,261 0 04 Oct 2018
Improving Generalization via Scalable Neighborhood Component Analysis Zhirong Wu Alexei A. Efros Stella X. Yu BDL 47 145 0 14 Aug 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 159 3,160 0 20 Jun 2018
On Calibration of Modern Neural Networks Chuan Guo Geoff Pleiss Yu Sun Kilian Q. Weinberger UQCV 186 5,774 0 14 Jun 2017
Regularizing Neural Networks by Penalizing Confident Output Distributions Gabriel Pereyra George Tucker J. Chorowski Lukasz Kaiser Geoffrey E. Hinton NoLa 107 1,133 0 23 Jan 2017
Wide Residual Networks Sergey Zagoruyko N. Komodakis 249 7,951 0 23 May 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.3K 192,638 0 10 Dec 2015
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 433 27,231 0 02 Dec 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 198 19,448 0 09 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe Christian Szegedy OOD 286 43,154 0 11 Feb 2015
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Junyoung Chung Çağlar Gülçehre Kyunghyun Cho Yoshua Bengio 250 12,632 0 11 Dec 2014