ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23489
  4. Cited By
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training

SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training

29 May 2025
Ildus Sadrtdinov
Ivan Klimov
E. Lobacheva
Dmitry Vetrov
ArXivPDFHTML

Papers citing "SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training"

50 / 55 papers shown
Title
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Pulkit Gopalani
Ekdeep Singh Lubana
Wei Hu
55
3
0
29 Oct 2024
Where Do Large Learning Rates Lead Us?
Where Do Large Learning Rates Lead Us?
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
61
1
0
29 Oct 2024
Towards Few-Shot Adaptation of Foundation Models via Multitask
  Finetuning
Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning
Zhuoyan Xu
Zhenmei Shi
Junyi Wei
Fangzhou Mu
Yin Li
Yingyu Liang
33
24
0
22 Feb 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
45
27
0
03 Jan 2024
Why Do We Need Weight Decay in Modern Deep Learning?
Why Do We Need Weight Decay in Modern Deep Learning?
Maksym Andriushchenko
Francesco DÁngelo
Aditya Varre
Nicolas Flammarion
64
33
0
06 Oct 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio
  (GSNR)
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
41
2
0
24 Sep 2023
On the different regimes of Stochastic Gradient Descent
On the different regimes of Stochastic Gradient Descent
Antonio Sclocchi
Matthieu Wyart
47
20
0
19 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
  Simplicity Bias in MLMs
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
41
66
0
13 Sep 2023
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards
  Simpler Subnetworks
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
F. Chen
D. Kunin
Atsushi Yamamura
Surya Ganguli
68
27
0
07 Jun 2023
Phase transitions in the mini-batch size for sparse and dense two-layer
  neural networks
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks
Raffaele Marino
F. Ricci-Tersenghi
41
15
0
10 May 2023
Bayesian Free Energy of Deep ReLU Neural Network in Overparametrized
  Cases
Bayesian Free Energy of Deep ReLU Neural Network in Overparametrized Cases
Shuya Nagayasu
Sumio Watanabe
BDL
63
3
0
28 Mar 2023
Symbolic Discovery of Optimization Algorithms
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
105
367
0
13 Feb 2023
SGD with Large Step Sizes Learns Sparse Features
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
72
58
0
11 Oct 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in
  Three Regimes
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
64
17
0
08 Sep 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
170
2,428
0
15 Jun 2022
Beyond the Imitation Game: Quantifying and extrapolating the
  capabilities of language models
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
86
1,726
0
09 Jun 2022
Towards Understanding Grokking: An Effective Theory of Representation
  Learning
Towards Understanding Grokking: An Effective Theory of Representation Learning
Ziming Liu
O. Kitouni
Niklas Nolte
Eric J. Michaud
Max Tegmark
Mike Williams
AI4CE
55
150
0
20 May 2022
Grokking: Generalization Beyond Overfitting on Small Algorithmic
  Datasets
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Alethea Power
Yuri Burda
Harrison Edwards
Igor Babuschkin
Vedant Misra
43
347
0
06 Jan 2022
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
121
74
0
29 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
  and Anomalous Diffusion
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
57
17
0
19 Jul 2021
On the Periodic Behavior of Neural Network Training with Batch
  Normalization and Weight Decay
On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay
E. Lobacheva
M. Kodryan
Nadezhda Chirkova
A. Malinin
Dmitry Vetrov
54
26
0
29 Jun 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of
  Stability
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
60
259
0
26 Feb 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
671
28,659
0
26 Feb 2021
Strength of Minibatch Noise in SGD
Strength of Minibatch Noise in SGD
Liu Ziyin
Kangqiao Liu
Takashi Mori
Masakuni Ueda
ODL
MLT
26
35
0
10 Feb 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
19
202
0
28 Jan 2021
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
  Descent
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
80
39
0
07 Dec 2020
Implicit Gradient Regularization
Implicit Gradient Regularization
David Barrett
Benoit Dherin
50
149
0
23 Sep 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
30
99
0
26 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
162
5,734
0
20 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
469
41,106
0
28 May 2020
A Free-Energy Principle for Representation Learning
A Free-Energy Principle for Representation Learning
Yansong Gao
Pratik Chaudhari
DRL
18
9
0
27 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
447
4,662
0
23 Jan 2020
Understanding Why Neural Networks Generalize Well Through GSNR of
  Parameters
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Jinlong Liu
Guo-qing Jiang
Yunzhi Bai
Ting Chen
Huayan Wang
AI4CE
103
50
0
21 Jan 2020
Big Transfer (BiT): General Visual Representation Learning
Big Transfer (BiT): General Visual Representation Learning
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
J. Puigcerver
Jessica Yung
Sylvain Gelly
N. Houlsby
MQ
213
1,196
0
24 Dec 2019
Deep Double Descent: Where Bigger Models and More Data Hurt
Deep Double Descent: Where Bigger Models and More Data Hurt
Preetum Nakkiran
Gal Kaplun
Yamini Bansal
Tristan Yang
Boaz Barak
Ilya Sutskever
105
925
0
04 Dec 2019
Towards Explaining the Regularization Effect of Initial Large Learning
  Rate in Training Neural Networks
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
32
295
0
10 Jul 2019
Reconciling modern machine learning practice and the bias-variance
  trade-off
Reconciling modern machine learning practice and the bias-variance trade-off
M. Belkin
Daniel J. Hsu
Siyuan Ma
Soumik Mandal
172
1,628
0
28 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
121
448
0
21 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
943
93,936
0
11 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
137
1,261
0
04 Oct 2018
Fluctuation-dissipation relations for stochastic gradient descent
Fluctuation-dissipation relations for stochastic gradient descent
Sho Yaida
59
75
0
28 Sep 2018
TherML: Thermodynamics of Machine Learning
TherML: Thermodynamics of Machine Learning
Alexander A. Alemi
Ian S. Fischer
DRL
AI4CE
41
28
0
11 Jul 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a
  Fixed Learning Rate
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel Nacson
Nathan Srebro
Daniel Soudry
FedML
MLT
51
100
0
05 Jun 2018
Energy-entropy competition and the effectiveness of stochastic gradient
  descent in machine learning
Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning
Yao Zhang
Andrew M. Saxe
Madhu S. Advani
A. Lee
42
60
0
05 Mar 2018
Essentially No Barriers in Neural Network Energy Landscape
Essentially No Barriers in Neural Network Energy Landscape
Felix Dräxler
K. Veschgini
M. Salmhofer
Fred Hamprecht
MoMe
97
430
0
02 Mar 2018
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
T. Garipov
Pavel Izmailov
Dmitrii Podoprikhin
Dmitry Vetrov
A. Wilson
UQCV
56
746
0
27 Feb 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
48
289
0
18 Dec 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
67
459
0
13 Nov 2017
Stochastic gradient descent performs variational inference, converges to
  limit cycles for deep networks
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
45
303
0
30 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
54
250
0
17 Oct 2017
12
Next