ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.03432
  4. Cited By
On the distance between two neural networks and the stability of
  learning

On the distance between two neural networks and the stability of learning

9 February 2020
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Ming-Yu Liu
    ODL
ArXivPDFHTML

Papers citing "On the distance between two neural networks and the stability of learning"

37 / 37 papers shown
Title
Deep Sturm--Liouville: From Sample-Based to 1D Regularization with Learnable Orthogonal Basis Functions
Deep Sturm--Liouville: From Sample-Based to 1D Regularization with Learnable Orthogonal Basis Functions
David Vigouroux
Joseba Dalmau
Louis Bethune
Victor Boutin
21
0
0
09 Apr 2025
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
54
1
0
24 Feb 2025
Learning Versatile Optimizers on a Compute Diet
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
138
0
0
22 Jan 2025
Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning
Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning
Muyun Li
Aaron Fainman
Stefan Vlaski
36
0
0
06 Jan 2025
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
18
2
0
31 Oct 2024
Infinite Limits of Multi-head Transformer Dynamics
Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon
Hamza Tahir Chaudhry
C. Pehlevan
AI4CE
44
9
0
24 May 2024
Scalable Optimization in the Modular Norm
Scalable Optimization in the Modular Norm
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
44
12
0
23 May 2024
A Differential Geometric View and Explainability of GNN on Evolving
  Graphs
A Differential Geometric View and Explainability of GNN on Evolving Graphs
Yazheng Liu
Xi Zhang
Sihong Xie
19
3
0
11 Mar 2024
Automatic Optimisation of Normalised Neural Networks
Automatic Optimisation of Normalised Neural Networks
Namhoon Cho
Hyo-Sang Shin
27
1
0
17 Dec 2023
Analyzing and Improving the Training Dynamics of Diffusion Models
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
28
155
0
05 Dec 2023
Signal Processing Meets SGD: From Momentum to Filter
Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao
Guisong Chang
Jiaqi Zhang
Qi Zhang
Dazhou Li
Yu Zhang
ODL
24
0
0
06 Nov 2023
A Spectral Condition for Feature Learning
A Spectral Condition for Feature Learning
Greg Yang
James B. Simon
Jeremy Bernstein
22
25
0
26 Oct 2023
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and
  Scaling Limit
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon
Lorenzo Noci
Mufan Bill Li
Boris Hanin
C. Pehlevan
27
23
0
28 Sep 2023
Multiplicative update rules for accelerating deep learning training and
  increasing robustness
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
On the Weight Dynamics of Deep Normalized Networks
On the Weight Dynamics of Deep Normalized Networks
Christian H. X. Ali Mehmeti-Göpel
Michael Wand
30
1
0
01 Jun 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters
Automatic Gradient Descent: Deep Learning without Hyperparameters
Jeremy Bernstein
Chris Mingard
Kevin Huang
Navid Azizan
Yisong Yue
ODL
16
17
0
11 Apr 2023
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
Maor Ivgi
Oliver Hinder
Y. Carmon
ODL
26
56
0
08 Feb 2023
Efficient Parametric Approximations of Neural Network Function Space
  Distance
Efficient Parametric Approximations of Neural Network Function Space Distance
Nikita Dhawan
Sicong Huang
Juhan Bae
Roger C. Grosse
14
5
0
07 Feb 2023
On Suppressing Range of Adaptive Stepsizes of Adam to Improve
  Generalisation Performance
On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance
Guoqiang Zhang
ODL
8
4
0
02 Feb 2023
VeLO: Training Versatile Learned Optimizers by Scaling Up
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
29
60
0
17 Nov 2022
Optimisation & Generalisation in Networks of Neurons
Optimisation & Generalisation in Networks of Neurons
Jeremy Bernstein
AI4CE
21
2
0
18 Oct 2022
Learning to Optimize Quasi-Newton Methods
Learning to Optimize Quasi-Newton Methods
Isaac Liao
Rumen Dangovski
Jakob N. Foerster
Marin Soljacic
36
4
0
11 Oct 2022
A Control Theoretic Framework for Adaptive Gradient Optimizers in
  Machine Learning
A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning
Kushal Chakrabarti
Nikhil Chopra
ODL
AI4CE
11
6
0
04 Jun 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the
  Adaptive Stepsize Range
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
11
2
0
24 Mar 2022
Towards understanding deep learning with the natural clustering prior
Towards understanding deep learning with the natural clustering prior
Simon Carbonnelle
13
0
0
15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
148
0
07 Mar 2022
Deep Bayesian inference for seismic imaging with tasks
Deep Bayesian inference for seismic imaging with tasks
Ali Siahkoohi
G. Rizzuti
Felix J. Herrmann
BDL
UQCV
38
21
0
10 Oct 2021
Guiding Evolutionary Strategies by Differentiable Robot Simulators
Guiding Evolutionary Strategies by Differentiable Robot Simulators
Vladislav Kurenkov
Bulat Maksudov
29
2
0
01 Oct 2021
Online-Learning Deep Neuro-Adaptive Dynamic Inversion Controller for
  Model Free Control
Online-Learning Deep Neuro-Adaptive Dynamic Inversion Controller for Model Free Control
Nathan Lutes
K. Krishnamurthy
Venkata Sriram Siddhardh Nadendla
S. Balakrishnan
17
0
0
21 Jul 2021
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
A. Davtyan
Sepehr Sameni
L. Cerkezi
Givi Meishvili
Adam Bielski
Paolo Favaro
ODL
51
2
0
07 Jul 2021
Solving hybrid machine learning tasks by traversing weight space
  geodesics
Solving hybrid machine learning tasks by traversing weight space geodesics
G. Raghavan
Matt Thomson
10
0
0
05 Jun 2021
Learning by Turning: Neural Architecture Aware Optimisation
Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu
Jeremy Bernstein
M. Meister
Yisong Yue
ODL
41
26
0
14 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed
  Gradients
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Juntang Zhuang
Tommy M. Tang
Yifan Ding
S. Tatikonda
Nicha Dvornek
X. Papademetris
James S. Duncan
ODL
11
499
0
15 Oct 2020
Why Spectral Normalization Stabilizes GANs: Analysis and Improvements
Why Spectral Normalization Stabilizes GANs: Analysis and Improvements
Zinan Lin
Vyas Sekar
Giulia Fanti
6
49
0
06 Sep 2020
Learning compositional functions via multiplicative weight updates
Learning compositional functions via multiplicative weight updates
Jeremy Bernstein
Jiawei Zhao
M. Meister
Ming-Yu Liu
Anima Anandkumar
Yisong Yue
6
26
0
25 Jun 2020
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train
  10,000-Layer Vanilla Convolutional Neural Networks
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
220
348
0
14 Jun 2018
1