ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.07774
  4. Cited By
On the interplay between noise and curvature and its effect on
  optimization and generalization

On the interplay between noise and curvature and its effect on optimization and generalization

18 June 2019
Valentin Thomas
Fabian Pedregosa
B. V. Merrienboer
Pierre-Antoine Mangazol
Yoshua Bengio
Nicolas Le Roux
ArXivPDFHTML

Papers citing "On the interplay between noise and curvature and its effect on optimization and generalization"

22 / 22 papers shown
Title
An Improved Empirical Fisher Approximation for Natural Gradient Descent
An Improved Empirical Fisher Approximation for Natural Gradient Descent
Xiaodong Wu
Wenyi Yu
Chao Zhang
Philip Woodland
36
3
0
10 Jun 2024
Fading memory as inductive bias in residual recurrent networks
Fading memory as inductive bias in residual recurrent networks
I. Dubinin
Felix Effenberger
45
4
0
27 Jul 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent:
  Implications for Weight Variances
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
37
3
0
08 Jun 2023
Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
Riade Benbaki
Wenyu Chen
X. Meng
Hussein Hazimeh
Natalia Ponomareva
Zhe Zhao
Rahul Mazumder
21
26
0
28 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Mårten Björkman
33
7
0
28 Jan 2023
How to select an objective function using information theory
How to select an objective function using information theory
T. Hodson
T. Over
T. Smith
Lucy M. Marshall
21
1
0
10 Dec 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of
  SGD via Training Trajectories and via Terminal States
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
35
10
0
19 Nov 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
42
2
0
24 Oct 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed
  Preconditioning
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
Shaoshuai Shi
Wei Wang
Bo Li
38
10
0
30 Jun 2022
Hybrid quantum ResNet for car classification and its hyperparameter
  optimization
Hybrid quantum ResNet for car classification and its hyperparameter optimization
Asel Sagingalieva
Mohammad Kordzanganeh
Andrii Kurkin
Artem Melnikov
Daniil Kuhmistrov
M. Perelshtein
A. Melnikov
Andrea Skolik
David Von Dollen
66
36
0
10 May 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
9
0
31 Jan 2022
A generalization gap estimation for overparameterized models via the
  Langevin functional variance
A generalization gap estimation for overparameterized models via the Langevin functional variance
Akifumi Okuno
Keisuke Yano
50
1
0
07 Dec 2021
Fishr: Invariant Gradient Variances for Out-of-Distribution
  Generalization
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization
Alexandre Ramé
Corentin Dancette
Matthieu Cord
OOD
50
206
0
07 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
  and Anomalous Diffusion
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
36
16
0
19 Jul 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and
  Communication Tasks
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
Shaoshuai Shi
Lin Zhang
Bo Li
45
9
0
14 Jul 2021
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
Elias Frantar
Eldar Kurtic
Dan Alistarh
18
57
0
07 Jul 2021
Post-mortem on a deep learning contest: a Simpson's paradox and the
  complementary roles of scale metrics versus shape metrics
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics
Charles H. Martin
Michael W. Mahoney
25
19
0
01 Jun 2021
Deep Learning is Singular, and That's Good
Deep Learning is Singular, and That's Good
Daniel Murfet
Susan Wei
Biwei Huang
Hui Li
Jesse Gell-Redman
T. Quella
UQCV
26
26
0
22 Oct 2020
When Does Preconditioning Help or Hurt Generalization?
When Does Preconditioning Help or Hurt Generalization?
S. Amari
Jimmy Ba
Roger C. Grosse
Xuechen Li
Atsushi Nitanda
Taiji Suzuki
Denny Wu
Ji Xu
36
32
0
18 Jun 2020
Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Chuan-Sheng Foo
Rio Yokota
39
36
0
13 Feb 2020
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent
  Estimates
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates
Jeffrey Negrea
Mahdi Haghifam
Gintare Karolina Dziugaite
Ashish Khisti
Daniel M. Roy
FedML
115
148
0
06 Nov 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
312
2,896
0
15 Sep 2016
1