Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.03432
Cited By
On the distance between two neural networks and the stability of learning
9 February 2020
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Ming-Yu Liu
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the distance between two neural networks and the stability of learning"
37 / 37 papers shown
Title
Deep Sturm--Liouville: From Sample-Based to 1D Regularization with Learnable Orthogonal Basis Functions
David Vigouroux
Joseba Dalmau
Louis Bethune
Victor Boutin
21
0
0
09 Apr 2025
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
54
1
0
24 Feb 2025
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
138
0
0
22 Jan 2025
Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning
Muyun Li
Aaron Fainman
Stefan Vlaski
36
0
0
06 Jan 2025
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
18
2
0
31 Oct 2024
Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon
Hamza Tahir Chaudhry
C. Pehlevan
AI4CE
44
9
0
24 May 2024
Scalable Optimization in the Modular Norm
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
44
12
0
23 May 2024
A Differential Geometric View and Explainability of GNN on Evolving Graphs
Yazheng Liu
Xi Zhang
Sihong Xie
19
3
0
11 Mar 2024
Automatic Optimisation of Normalised Neural Networks
Namhoon Cho
Hyo-Sang Shin
27
1
0
17 Dec 2023
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
28
155
0
05 Dec 2023
Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao
Guisong Chang
Jiaqi Zhang
Qi Zhang
Dazhou Li
Yu Zhang
ODL
24
0
0
06 Nov 2023
A Spectral Condition for Feature Learning
Greg Yang
James B. Simon
Jeremy Bernstein
22
25
0
26 Oct 2023
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon
Lorenzo Noci
Mufan Bill Li
Boris Hanin
C. Pehlevan
27
23
0
28 Sep 2023
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
On the Weight Dynamics of Deep Normalized Networks
Christian H. X. Ali Mehmeti-Göpel
Michael Wand
30
1
0
01 Jun 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters
Jeremy Bernstein
Chris Mingard
Kevin Huang
Navid Azizan
Yisong Yue
ODL
16
17
0
11 Apr 2023
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
Maor Ivgi
Oliver Hinder
Y. Carmon
ODL
26
56
0
08 Feb 2023
Efficient Parametric Approximations of Neural Network Function Space Distance
Nikita Dhawan
Sicong Huang
Juhan Bae
Roger C. Grosse
14
5
0
07 Feb 2023
On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance
Guoqiang Zhang
ODL
8
4
0
02 Feb 2023
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
29
60
0
17 Nov 2022
Optimisation & Generalisation in Networks of Neurons
Jeremy Bernstein
AI4CE
21
2
0
18 Oct 2022
Learning to Optimize Quasi-Newton Methods
Isaac Liao
Rumen Dangovski
Jakob N. Foerster
Marin Soljacic
36
4
0
11 Oct 2022
A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning
Kushal Chakrabarti
Nikhil Chopra
ODL
AI4CE
11
6
0
04 Jun 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
11
2
0
24 Mar 2022
Towards understanding deep learning with the natural clustering prior
Simon Carbonnelle
13
0
0
15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
148
0
07 Mar 2022
Deep Bayesian inference for seismic imaging with tasks
Ali Siahkoohi
G. Rizzuti
Felix J. Herrmann
BDL
UQCV
38
21
0
10 Oct 2021
Guiding Evolutionary Strategies by Differentiable Robot Simulators
Vladislav Kurenkov
Bulat Maksudov
29
2
0
01 Oct 2021
Online-Learning Deep Neuro-Adaptive Dynamic Inversion Controller for Model Free Control
Nathan Lutes
K. Krishnamurthy
Venkata Sriram Siddhardh Nadendla
S. Balakrishnan
17
0
0
21 Jul 2021
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
A. Davtyan
Sepehr Sameni
L. Cerkezi
Givi Meishvili
Adam Bielski
Paolo Favaro
ODL
51
2
0
07 Jul 2021
Solving hybrid machine learning tasks by traversing weight space geodesics
G. Raghavan
Matt Thomson
10
0
0
05 Jun 2021
Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu
Jeremy Bernstein
M. Meister
Yisong Yue
ODL
41
26
0
14 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Juntang Zhuang
Tommy M. Tang
Yifan Ding
S. Tatikonda
Nicha Dvornek
X. Papademetris
James S. Duncan
ODL
11
499
0
15 Oct 2020
Why Spectral Normalization Stabilizes GANs: Analysis and Improvements
Zinan Lin
Vyas Sekar
Giulia Fanti
6
49
0
06 Sep 2020
Learning compositional functions via multiplicative weight updates
Jeremy Bernstein
Jiawei Zhao
M. Meister
Ming-Yu Liu
Anima Anandkumar
Yisong Yue
6
26
0
25 Jun 2020
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
220
348
0
14 Jun 2018
1