v1v2v3v4v5v6v7 (latest)

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

19 March 2015

Papers citing "Optimizing Neural Networks with Kronecker-factored Approximate Curvature"

50 / 645 papers shown

Title
Tradeoffs of Diagonal Fisher Information Matrix Estimators Alexander Soen Ke Sun 85 3 0 08 Feb 2024
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners Omead Brandon Pooladzandi Xi-Lin Li 88 8 0 07 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective Wu Lin Felix Dangel Runa Eschenhagen Juhan Bae Richard Turner Alireza Makhzani ODL 163 13 0 05 Feb 2024
Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks Yongchang Hao Yanshuai Cao Lili Mou ODL 57 1 0 05 Feb 2024
Neglected Hessian component explains mysteries in Sharpness regularization Yann N. Dauphin Atish Agarwala Hossein Mobahi FAtt 124 7 0 19 Jan 2024
A Kaczmarz-inspired approach to accelerate the optimization of neural network wavefunctions Gil Goldshlager Nilin Abrahamsen Lin Lin 106 14 0 18 Jan 2024
The LLM Surgeon Tycho F. A. van der Ouderaa Markus Nagel M. V. Baalen Yuki Markus Asano Tijmen Blankevoort 114 18 0 28 Dec 2023
On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width Satoki Ishikawa Ryo Karakida 110 2 0 19 Dec 2023
Unveiling Empirical Pathologies of Laplace Approximation for Uncertainty Estimation Maksim Zhdanov Stanislav Dereka Sergey Kolesnikov 21 0 0 16 Dec 2023
Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC Wu Lin Felix Dangel Runa Eschenhagen Kirill Neklyudov Agustinus Kristiadi Richard Turner Alireza Makhzani 65 4 0 09 Dec 2023
Merging by Matching Models in Task Parameter Subspaces Derek Tam Mohit Bansal Colin Raffel MoMe 109 12 0 07 Dec 2023
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives Pierre Wolinski ODL 161 0 0 06 Dec 2023
Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent Frederik Köhne Leonie Kreis Anton Schiela Roland A. Herzog 89 2 0 28 Nov 2023
Frobenius-Type Norms and Inner Products of Matrices and Linear Maps with Applications to Neural Network Training Roland A. Herzog Frederik Köhne Leonie Kreis Anton Schiela 18 4 0 26 Nov 2023
Leveraging Function Space Aggregation for Federated Learning at Scale Nikita Dhawan Nicole Mitchell Zachary B. Charles Zachary Garrett Gintare Karolina Dziugaite FedML 84 3 0 17 Nov 2023
A Computationally Efficient Sparsified Online Newton Method Fnu Devvrit Sai Surya Duvvuri Rohan Anil Vineet Gupta Cho-Jui Hsieh Inderjit Dhillon 53 0 0 16 Nov 2023
Riemannian Laplace Approximation with the Fisher Metric Hanlin Yu Marcelo Hartmann Bernardo Williams Mark Girolami Arto Klami 109 3 0 05 Nov 2023
Simplifying Transformer Blocks Bobby He Thomas Hofmann 109 36 0 03 Nov 2023
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures Runa Eschenhagen Alexander Immer Richard Turner Frank Schneider Philipp Hennig 135 24 0 01 Nov 2023
Efficient Numerical Algorithm for Large-Scale Damped Natural Gradient Descent Yixiao Chen Hao Xie Han Wang 13 2 0 26 Oct 2023
Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens Ross M. Clarke José Miguel Hernández-Lobato 123 2 0 23 Oct 2023
Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks E. T. Oldewage Ross M. Clarke José Miguel Hernández-Lobato ODL 54 1 0 23 Oct 2023
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization Siddharth Singh Zack Sating A. Bhatele ODL 72 0 0 18 Oct 2023
Optimising Distributions with Natural Gradient Surrogates Jonathan So Richard Turner 43 1 0 18 Oct 2023
Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic System Identification with Application to Audio Processing Karim Helwani Erfan Soltanmohammadi Michael M. Goodwin 55 0 0 10 Oct 2023
Learning Layer-wise Equivariances Automatically using Gradients Tycho F. A. van der Ouderaa Alexander Immer Mark van der Wilk MLT 108 14 0 09 Oct 2023
A Meta-Learning Perspective on Transformers for Causal Language Modeling Xinbo Wu Lav Varshney 80 7 0 09 Oct 2023
FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation Xiang Liu Liangxi Liu Feiyang Ye Yunheng Shen Xia Li Linshan Jiang Jialin Li 106 6 0 30 Sep 2023
On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective Jonathan Wenger Felix Dangel Agustinus Kristiadi 99 0 0 29 Sep 2023
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification M. Milling Andreas Triantafyllopoulos Iosif Tsangko Simon Rampp F. Schlüter 118 3 0 28 Sep 2023
A Primer on Bayesian Neural Networks: Review and Debates Federico Danieli Konstantinos Pitas M. Vladimirova Vincent Fortuin BDL AAML 105 20 0 28 Sep 2023
A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings Alokendu Mazumder Rishabh Sabharwal Manan Tayal Bhartendu Kumar Punit Rathore 46 0 0 15 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale Hao-Jun Michael Shi Tsung-Hsien Lee Shintaro Iwasaki Jose Gallego-Posada Zhijing Li Kaushik Rangadurai Dheevatsa Mudigere Michael Rabbat ODL 98 27 0 12 Sep 2023
The fine print on tempered posteriors Konstantinos Pitas Julyan Arbel 72 1 0 11 Sep 2023
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra Andres Potapczynski Marc Finzi Geoff Pleiss Andrew Gordon Wilson 57 9 0 06 Sep 2023
Incorporating Neuro-Inspired Adaptability for Continual Learning in Artificial Intelligence Liyuan Wang Xingxing Zhang Qian Li Mingtian Zhang Hang Su Jun Zhu Yi Zhong 95 57 0 29 Aug 2023
Towards Accelerated Model Training via Bayesian Data Selection Zhijie Deng Peng Cui Jun Zhu 89 5 0 21 Aug 2023
Dual Gauss-Newton Directions for Deep Learning Vincent Roulet Mathieu Blondel ODL 54 0 0 17 Aug 2023
Eva: A General Vectorized Approximation Framework for Second-order Optimization Lin Zhang Shaoshuai Shi Yue Liu 79 1 0 04 Aug 2023
mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization Yue Niu Zalan Fabian Sunwoo Lee Mahdi Soltanolkotabi Salman Avestimehr ODL 34 2 0 25 Jul 2023
Modify Training Directions in Function Space to Reduce Generalization Error Yi Yu Wenlian Lu Boyu Chen 78 0 0 25 Jul 2023
Variational Monte Carlo on a Budget -- Fine-tuning pre-trained Neural Wavefunctions Michael Scherbela Leon Gerard Philipp Grohs 68 7 0 15 Jul 2023
Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks Dominik Schnaus Jongseok Lee Zorah Lähner Rudolph Triebel UQCV BDL 78 1 0 15 Jul 2023
Robust scalable initialization for Bayesian variational inference with multi-modal Laplace approximations Wyatt Bridgman Reese E. Jones Mohammad Khalil 66 1 0 12 Jul 2023
Self-Expanding Neural Networks Rupert Mitchell Robin Menzenbach Kristian Kersting Martin Mundt 110 9 0 10 Jul 2023
Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation Kirill Neklyudov J. Nys Luca Thiede Juan Carrasquilla Qiang Liu Max Welling Alireza Makhzani 44 13 0 06 Jul 2023
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer Peng Mi Li Shen Tianhe Ren Yiyi Zhou Tianshuo Xu Xiaoshuai Sun Tongliang Liu Rongrong Ji Dacheng Tao AAML 73 2 0 30 Jun 2023
Efficient Backdoor Removal Through Natural Gradient Fine-tuning Nazmul Karim Abdullah Al Arafat Umar Khalid Zhishan Guo Naznin Rahnavard AAML 68 1 0 30 Jun 2023
Riemannian Laplace approximations for Bayesian neural networks Federico Bergamin Pablo Moreno-Muñoz Søren Hauberg Georgios Arvanitidis BDL 81 7 0 12 Jun 2023
Error Feedback Can Accurately Compress Preconditioners Ionut-Vlad Modoranu A. Kalinov Eldar Kurtic Elias Frantar Dan Alistarh ODL 107 5 0 09 Jun 2023