SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training

29 May 2025

Papers citing "SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training"

5 / 55 papers shown

Title
Stochastic Gradient Descent as Approximate Bayesian Inference Stephan Mandt Matthew D. Hoffman David M. Blei BDL 44 594 0 13 Apr 2017
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys Pratik Chaudhari A. Choromańska Stefano Soatto Yann LeCun Carlo Baldassi C. Borgs J. Chayes Levent Sagun R. Zecchina ODL 84 769 0 06 Nov 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.4K 192,638 0 10 Dec 2015
Deep Learning and the Information Bottleneck Principle Naftali Tishby Noga Zaslavsky DRL 126 1,570 0 09 Mar 2015
Practical recommendations for gradient-based training of deep architectures Yoshua Bengio 3DH ODL 146 2,195 0 24 Jun 2012