ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1503.05671
  4. Cited By
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
v1v2v3v4v5v6v7 (latest)

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

19 March 2015
James Martens
Roger C. Grosse
    ODL
ArXiv (abs)PDFHTML

Papers citing "Optimizing Neural Networks with Kronecker-factored Approximate Curvature"

50 / 645 papers shown
Title
Tradeoffs of Diagonal Fisher Information Matrix Estimators
Tradeoffs of Diagonal Fisher Information Matrix Estimators
Alexander Soen
Ke Sun
85
3
0
08 Feb 2024
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
Omead Brandon Pooladzandi
Xi-Lin Li
88
8
0
07 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A
  Second-Order Perspective
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard Turner
Alireza Makhzani
ODL
163
13
0
05 Feb 2024
Ginger: An Efficient Curvature Approximation with Linear Complexity for
  General Neural Networks
Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
ODL
57
1
0
05 Feb 2024
Neglected Hessian component explains mysteries in Sharpness
  regularization
Neglected Hessian component explains mysteries in Sharpness regularization
Yann N. Dauphin
Atish Agarwala
Hossein Mobahi
FAtt
124
7
0
19 Jan 2024
A Kaczmarz-inspired approach to accelerate the optimization of neural
  network wavefunctions
A Kaczmarz-inspired approach to accelerate the optimization of neural network wavefunctions
Gil Goldshlager
Nilin Abrahamsen
Lin Lin
106
14
0
18 Jan 2024
The LLM Surgeon
The LLM Surgeon
Tycho F. A. van der Ouderaa
Markus Nagel
M. V. Baalen
Yuki Markus Asano
Tijmen Blankevoort
114
18
0
28 Dec 2023
On the Parameterization of Second-Order Optimization Effective Towards
  the Infinite Width
On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width
Satoki Ishikawa
Ryo Karakida
110
2
0
19 Dec 2023
Unveiling Empirical Pathologies of Laplace Approximation for Uncertainty
  Estimation
Unveiling Empirical Pathologies of Laplace Approximation for Uncertainty Estimation
Maksim Zhdanov
Stanislav Dereka
Sergey Kolesnikov
21
0
0
16 Dec 2023
Structured Inverse-Free Natural Gradient: Memory-Efficient &
  Numerically-Stable KFAC
Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
Wu Lin
Felix Dangel
Runa Eschenhagen
Kirill Neklyudov
Agustinus Kristiadi
Richard Turner
Alireza Makhzani
65
4
0
09 Dec 2023
Merging by Matching Models in Task Parameter Subspaces
Merging by Matching Models in Task Parameter Subspaces
Derek Tam
Mohit Bansal
Colin Raffel
MoMe
109
12
0
07 Dec 2023
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives
Pierre Wolinski
ODL
161
0
0
06 Dec 2023
Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent
Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent
Frederik Köhne
Leonie Kreis
Anton Schiela
Roland A. Herzog
89
2
0
28 Nov 2023
Frobenius-Type Norms and Inner Products of Matrices and Linear Maps with
  Applications to Neural Network Training
Frobenius-Type Norms and Inner Products of Matrices and Linear Maps with Applications to Neural Network Training
Roland A. Herzog
Frederik Köhne
Leonie Kreis
Anton Schiela
18
4
0
26 Nov 2023
Leveraging Function Space Aggregation for Federated Learning at Scale
Leveraging Function Space Aggregation for Federated Learning at Scale
Nikita Dhawan
Nicole Mitchell
Zachary B. Charles
Zachary Garrett
Gintare Karolina Dziugaite
FedML
84
3
0
17 Nov 2023
A Computationally Efficient Sparsified Online Newton Method
A Computationally Efficient Sparsified Online Newton Method
Fnu Devvrit
Sai Surya Duvvuri
Rohan Anil
Vineet Gupta
Cho-Jui Hsieh
Inderjit Dhillon
53
0
0
16 Nov 2023
Riemannian Laplace Approximation with the Fisher Metric
Riemannian Laplace Approximation with the Fisher Metric
Hanlin Yu
Marcelo Hartmann
Bernardo Williams
Mark Girolami
Arto Klami
109
3
0
05 Nov 2023
Simplifying Transformer Blocks
Simplifying Transformer Blocks
Bobby He
Thomas Hofmann
109
36
0
03 Nov 2023
Kronecker-Factored Approximate Curvature for Modern Neural Network
  Architectures
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
Runa Eschenhagen
Alexander Immer
Richard Turner
Frank Schneider
Philipp Hennig
135
24
0
01 Nov 2023
Efficient Numerical Algorithm for Large-Scale Damped Natural Gradient
  Descent
Efficient Numerical Algorithm for Large-Scale Damped Natural Gradient Descent
Yixiao Chen
Hao Xie
Han Wang
13
2
0
26 Oct 2023
Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens
Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens
Ross M. Clarke
José Miguel Hernández-Lobato
123
2
0
23 Oct 2023
Series of Hessian-Vector Products for Tractable Saddle-Free Newton
  Optimisation of Neural Networks
Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks
E. T. Oldewage
Ross M. Clarke
José Miguel Hernández-Lobato
ODL
54
1
0
23 Oct 2023
Jorge: Approximate Preconditioning for GPU-efficient Second-order
  Optimization
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization
Siddharth Singh
Zack Sating
A. Bhatele
ODL
72
0
0
18 Oct 2023
Optimising Distributions with Natural Gradient Surrogates
Optimising Distributions with Natural Gradient Surrogates
Jonathan So
Richard Turner
43
1
0
18 Oct 2023
Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic
  System Identification with Application to Audio Processing
Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic System Identification with Application to Audio Processing
Karim Helwani
Erfan Soltanmohammadi
Michael M. Goodwin
55
0
0
10 Oct 2023
Learning Layer-wise Equivariances Automatically using Gradients
Learning Layer-wise Equivariances Automatically using Gradients
Tycho F. A. van der Ouderaa
Alexander Immer
Mark van der Wilk
MLT
108
14
0
09 Oct 2023
A Meta-Learning Perspective on Transformers for Causal Language Modeling
A Meta-Learning Perspective on Transformers for Causal Language Modeling
Xinbo Wu
Lav Varshney
80
7
0
09 Oct 2023
FedLPA: One-shot Federated Learning with Layer-Wise Posterior
  Aggregation
FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation
Xiang Liu
Liangxi Liu
Feiyang Ye
Yunheng Shen
Xia Li
Linshan Jiang
Jialin Li
106
6
0
30 Sep 2023
On the Disconnect Between Theory and Practice of Neural Networks: Limits
  of the NTK Perspective
On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective
Jonathan Wenger
Felix Dangel
Agustinus Kristiadi
99
0
0
29 Sep 2023
Bringing the Discussion of Minima Sharpness to the Audio Domain: a
  Filter-Normalised Evaluation for Acoustic Scene Classification
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
M. Milling
Andreas Triantafyllopoulos
Iosif Tsangko
Simon Rampp
F. Schlüter
118
3
0
28 Sep 2023
A Primer on Bayesian Neural Networks: Review and Debates
A Primer on Bayesian Neural Networks: Review and Debates
Federico Danieli
Konstantinos Pitas
M. Vladimirova
Vincent Fortuin
BDLAAML
105
20
0
28 Sep 2023
A Theoretical and Empirical Study on the Convergence of Adam with an
  "Exact" Constant Step Size in Non-Convex Settings
A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings
Alokendu Mazumder
Rishabh Sabharwal
Manan Tayal
Bhartendu Kumar
Punit Rathore
46
0
0
15 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed
  Shampoo Optimizer for Training Neural Networks At-Scale
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
98
27
0
12 Sep 2023
The fine print on tempered posteriors
The fine print on tempered posteriors
Konstantinos Pitas
Julyan Arbel
72
1
0
11 Sep 2023
CoLA: Exploiting Compositional Structure for Automatic and Efficient
  Numerical Linear Algebra
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra
Andres Potapczynski
Marc Finzi
Geoff Pleiss
Andrew Gordon Wilson
57
9
0
06 Sep 2023
Incorporating Neuro-Inspired Adaptability for Continual Learning in
  Artificial Intelligence
Incorporating Neuro-Inspired Adaptability for Continual Learning in Artificial Intelligence
Liyuan Wang
Xingxing Zhang
Qian Li
Mingtian Zhang
Hang Su
Jun Zhu
Yi Zhong
95
57
0
29 Aug 2023
Towards Accelerated Model Training via Bayesian Data Selection
Towards Accelerated Model Training via Bayesian Data Selection
Zhijie Deng
Peng Cui
Jun Zhu
89
5
0
21 Aug 2023
Dual Gauss-Newton Directions for Deep Learning
Dual Gauss-Newton Directions for Deep Learning
Vincent Roulet
Mathieu Blondel
ODL
54
0
0
17 Aug 2023
Eva: A General Vectorized Approximation Framework for Second-order
  Optimization
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang
Shaoshuai Shi
Yue Liu
79
1
0
04 Aug 2023
mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural
  Network Optimization
mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization
Yue Niu
Zalan Fabian
Sunwoo Lee
Mahdi Soltanolkotabi
Salman Avestimehr
ODL
34
2
0
25 Jul 2023
Modify Training Directions in Function Space to Reduce Generalization
  Error
Modify Training Directions in Function Space to Reduce Generalization Error
Yi Yu
Wenlian Lu
Boyu Chen
78
0
0
25 Jul 2023
Variational Monte Carlo on a Budget -- Fine-tuning pre-trained Neural
  Wavefunctions
Variational Monte Carlo on a Budget -- Fine-tuning pre-trained Neural Wavefunctions
Michael Scherbela
Leon Gerard
Philipp Grohs
68
7
0
15 Jul 2023
Learning Expressive Priors for Generalization and Uncertainty Estimation
  in Neural Networks
Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks
Dominik Schnaus
Jongseok Lee
Zorah Lähner
Rudolph Triebel
UQCVBDL
78
1
0
15 Jul 2023
Robust scalable initialization for Bayesian variational inference with
  multi-modal Laplace approximations
Robust scalable initialization for Bayesian variational inference with multi-modal Laplace approximations
Wyatt Bridgman
Reese E. Jones
Mohammad Khalil
66
1
0
12 Jul 2023
Self-Expanding Neural Networks
Self-Expanding Neural Networks
Rupert Mitchell
Robin Menzenbach
Kristian Kersting
Martin Mundt
110
9
0
10 Jul 2023
Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the
  Quantum Many-Body Schrödinger Equation
Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation
Kirill Neklyudov
J. Nys
Luca Thiede
Juan Carrasquilla
Qiang Liu
Max Welling
Alireza Makhzani
44
13
0
06 Jul 2023
Systematic Investigation of Sparse Perturbed Sharpness-Aware
  Minimization Optimizer
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Peng Mi
Li Shen
Tianhe Ren
Yiyi Zhou
Tianshuo Xu
Xiaoshuai Sun
Tongliang Liu
Rongrong Ji
Dacheng Tao
AAML
73
2
0
30 Jun 2023
Efficient Backdoor Removal Through Natural Gradient Fine-tuning
Efficient Backdoor Removal Through Natural Gradient Fine-tuning
Nazmul Karim
Abdullah Al Arafat
Umar Khalid
Zhishan Guo
Naznin Rahnavard
AAML
68
1
0
30 Jun 2023
Riemannian Laplace approximations for Bayesian neural networks
Riemannian Laplace approximations for Bayesian neural networks
Federico Bergamin
Pablo Moreno-Muñoz
Søren Hauberg
Georgios Arvanitidis
BDL
81
7
0
12 Jun 2023
Error Feedback Can Accurately Compress Preconditioners
Error Feedback Can Accurately Compress Preconditioners
Ionut-Vlad Modoranu
A. Kalinov
Eldar Kurtic
Elias Frantar
Dan Alistarh
ODL
107
5
0
09 Jun 2023
Previous
12345...111213
Next