ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.07344
  4. Cited By
Temperature check: theory and practice for training models with
  softmax-cross-entropy losses

Temperature check: theory and practice for training models with softmax-cross-entropy losses

14 October 2020
Atish Agarwala
Jeffrey Pennington
Yann N. Dauphin
S. Schoenholz
    UQCV
ArXivPDFHTML

Papers citing "Temperature check: theory and practice for training models with softmax-cross-entropy losses"

27 / 27 papers shown
Title
Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights
Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights
Mostafa Mohaimen Akand Faisal
Rabeya Amin Jhuma
43
0
0
12 May 2025
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
62
155
0
07 Mar 2022
Finite Versus Infinite Neural Networks: an Empirical Study
Finite Versus Infinite Neural Networks: an Empirical Study
Jaehoon Lee
S. Schoenholz
Jeffrey Pennington
Ben Adlam
Lechao Xiao
Roman Novak
Jascha Narain Sohl-Dickstein
45
211
0
31 Jul 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
176
238
0
04 Mar 2020
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks
  Trained with the Logistic Loss
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
Lénaïc Chizat
Francis R. Bach
MLT
56
332
0
11 Feb 2020
Disentangling Trainability and Generalization in Deep Neural Networks
Disentangling Trainability and Generalization in Deep Neural Networks
Lechao Xiao
Jeffrey Pennington
S. Schoenholz
29
34
0
30 Dec 2019
Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Roman Novak
Lechao Xiao
Jiri Hron
Jaehoon Lee
Alexander A. Alemi
Jascha Narain Sohl-Dickstein
S. Schoenholz
46
227
0
05 Dec 2019
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any
  Architecture are Gaussian Processes
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
Greg Yang
71
197
0
28 Oct 2019
Why bigger is not always better: on finite and infinite neural networks
Why bigger is not always better: on finite and infinite neural networks
Laurence Aitchison
194
53
0
17 Oct 2019
Reverse engineering recurrent networks for sentiment classification
  reveals line attractor dynamics
Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
Niru Maheswaranathan
Alex H. Williams
Matthew D. Golub
Surya Ganguli
David Sussillo
40
78
0
25 Jun 2019
When Does Label Smoothing Help?
When Does Label Smoothing Help?
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
UQCV
125
1,931
0
06 Jun 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient
  Descent
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
122
1,089
0
18 Feb 2019
On Lazy Training in Differentiable Programming
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
82
823
0
19 Dec 2018
A Convergence Theory for Deep Learning via Over-Parameterization
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CE
ODL
170
1,457
0
09 Nov 2018
DropBlock: A regularization method for convolutional networks
DropBlock: A regularization method for convolutional networks
Golnaz Ghiasi
Nayeon Lee
Quoc V. Le
86
911
0
30 Oct 2018
Bayesian Deep Convolutional Networks with Many Channels are Gaussian
  Processes
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
Roman Novak
Lechao Xiao
Jaehoon Lee
Yasaman Bahri
Greg Yang
Jiri Hron
Daniel A. Abolafia
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
UQCV
BDL
42
308
0
11 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
130
1,261
0
04 Oct 2018
Improving Generalization via Scalable Neighborhood Component Analysis
Improving Generalization via Scalable Neighborhood Component Analysis
Zhirong Wu
Alexei A. Efros
Stella X. Yu
BDL
47
145
0
14 Aug 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
159
3,160
0
20 Jun 2018
On Calibration of Modern Neural Networks
On Calibration of Modern Neural Networks
Chuan Guo
Geoff Pleiss
Yu Sun
Kilian Q. Weinberger
UQCV
186
5,774
0
14 Jun 2017
Regularizing Neural Networks by Penalizing Confident Output
  Distributions
Regularizing Neural Networks by Penalizing Confident Output Distributions
Gabriel Pereyra
George Tucker
J. Chorowski
Lukasz Kaiser
Geoffrey E. Hinton
NoLa
107
1,133
0
23 Jan 2017
Wide Residual Networks
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
249
7,951
0
23 May 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.3K
192,638
0
10 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
433
27,231
0
02 Dec 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
198
19,448
0
09 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing
  Internal Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
286
43,154
0
11 Feb 2015
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence
  Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
250
12,632
0
11 Dec 2014
1