A Kronecker-factored approximate Fisher matrix for convolution layers

3 February 2016

Papers citing "A Kronecker-factored approximate Fisher matrix for convolution layers"

47 / 47 papers shown

Title
FedBEns: One-Shot Federated Learning based on Bayesian Ensemble Jacopo Talpini Marco Savi Giovanni Neglia FedML Presented at ResearchTrend Connect \| FedML on 07 May 2025 76 0 0 19 Mar 2025
Position: Curvature Matrices Should Be Democratized via Linear Operators Felix Dangel Runa Eschenhagen Weronika Ormaniec Andres Fernandez Lukas Tatzel Agustinus Kristiadi 58 3 0 31 Jan 2025
ANaGRAM: A Natural Gradient Relative to Adapted Model for efficient PINNs learning Nilo Schwencke Cyril Furtlehner 64 1 0 14 Dec 2024
Debiasing Mini-Batch Quadratics for Applications in Deep Learning Lukas Tatzel Bálint Mucsányi Osane Hackel Philipp Hennig 43 0 0 18 Oct 2024
Influence Functions for Scalable Data Attribution in Diffusion Models Bruno Mlodozeniec Runa Eschenhagen Juhan Bae Alexander Immer David Krueger Richard E. Turner TDI DiffM 75 4 0 17 Oct 2024
Second-Order Min-Max Optimization with Lazy Hessians Lesi Chen Chengchang Liu Jingzhao Zhang 43 1 0 12 Oct 2024
Scalable Bayesian Learning with posteriors Samuel Duffield Kaelan Donatella Johnathan Chiu Phoebe Klett Daniel Simpson BDL UQCV 62 1 0 31 May 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information Damien Martins Gomes Yanlei Zhang Eugene Belilovsky Guy Wolf Mahdi S. Hosseini ODL 76 2 0 26 May 2024
Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC Wu Lin Felix Dangel Runa Eschenhagen Kirill Neklyudov Agustinus Kristiadi Richard Turner Alireza Makhzani 22 3 0 09 Dec 2023
Eva: A General Vectorized Approximation Framework for Second-order Optimization Lin Zhang S. Shi Bo-wen Li 28 1 0 04 Aug 2023
Modify Training Directions in Function Space to Reduce Generalization Error Yi Yu Wenlian Lu Boyu Chen 24 0 0 25 Jul 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Hong Liu Zhiyuan Li David Leo Wright Hall Percy Liang Tengyu Ma VLM 46 128 0 23 May 2023
ISAAC Newton: Input-based Approximate Curvature for Newton's Method Felix Petersen Tobias Sutter Christian Borgelt Dongsung Huh Hilde Kuehne Yuekai Sun Oliver Deussen ODL 31 5 0 01 May 2023
Efficient Activation Function Optimization through Surrogate Modeling G. Bingham Risto Miikkulainen 16 2 0 13 Jan 2023
Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates C. Puiu 14 2 0 16 Oct 2022
Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization Tran van Sang Mhd Irvan R. Yamaguchi Toshiyuki Nakata 13 1 0 11 Oct 2022
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach Peng Mi Li Shen Tianhe Ren Yiyi Zhou Xiaoshuai Sun Rongrong Ji Dacheng Tao AAML 27 69 0 11 Oct 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning Lin Zhang S. Shi Wei Wang Bo-wen Li 36 10 0 30 Jun 2022
Information Geometry of Dropout Training Masanari Kimura H. Hino 14 2 0 22 Jun 2022
Debugging using Orthogonal Gradient Descent Narsimha Chilkuri C. Eliasmith 21 1 0 17 Jun 2022
Amortized Proximal Optimization Juhan Bae Paul Vicol Jeff Z. HaoChen Roger C. Grosse ODL 25 14 0 28 Feb 2022
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations Alexander Immer Tycho F. A. van der Ouderaa Gunnar Rätsch Vincent Fortuin Mark van der Wilk BDL 36 44 0 22 Feb 2022
A Geometric Understanding of Natural Gradient Qinxun Bai S. Rosenberg Wei Xu 21 2 0 13 Feb 2022
Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization Frederik Benzing ODL 40 23 0 28 Jan 2022
Merging Models with Fisher-Weighted Averaging Michael Matena Colin Raffel FedML MoMe 29 351 0 18 Nov 2021
Kronecker Factorization for Preventing Catastrophic Forgetting in Large-scale Medical Entity Linking Denis Jered McInerney Luyang Kong Kristjan Arumae Byron C. Wallace Parminder Bhatia CLL 24 1 0 11 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey Xiaoxin He Fuzhao Xue Xiaozhe Ren Yang You 24 14 0 01 Nov 2021
Nys-Newton: Nyström-Approximated Curvature for Stochastic Optimization Dinesh Singh Hardik Tankaria M. Yamada ODL 42 2 0 16 Oct 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks S. Shi Lin Zhang Bo-wen Li 40 9 0 14 Jul 2021
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information Elias Frantar Eldar Kurtic Dan Alistarh 13 57 0 07 Jul 2021
A Survey of Uncertainty in Deep Neural Networks J. Gawlikowski Cedrique Rovile Njieutcheu Tassi Mohsin Ali Jongseo Lee Matthias Humt ... R. Roscher Muhammad Shahzad Wen Yang R. Bamler Xiaoxiang Zhu BDL UQCV OOD 35 1,109 0 07 Jul 2021
Robust Out-of-Distribution Detection on Deep Probabilistic Generative Models Jaemoo Choi Changyeon Yoon Jeongwoo Bae Myung-joo Kang OODD 30 4 0 15 Jun 2021
TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion Saeed Soori Bugra Can Baourun Mu Mert Gurbuzbalaban M. Dehnavi 24 10 0 07 Jun 2021
LQF: Linear Quadratic Fine-Tuning Alessandro Achille Aditya Golatkar Avinash Ravichandran M. Polito Stefano Soatto 26 27 0 21 Dec 2020
A Trace-restricted Kronecker-Factored Approximation to Natural Gradient Kai-Xin Gao Xiaolei Liu Zheng-Hai Huang Min Wang Zidong Wang Dachuan Xu F. Yu 24 11 0 21 Nov 2020
Transform Quantization for CNN (Convolutional Neural Network) Compression Sean I. Young Wang Zhe David S. Taubman B. Girod MQ 29 69 0 02 Sep 2020
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization Neha S. Wadia Daniel Duckworth S. Schoenholz Ethan Dyer Jascha Narain Sohl-Dickstein 24 13 0 17 Aug 2020
A Differential Game Theoretic Neural Optimizer for Training Residual Networks Guan-Horng Liu T. Chen Evangelos A. Theodorou 24 2 0 17 Jul 2020
When Does Preconditioning Help or Hurt Generalization? S. Amari Jimmy Ba Roger C. Grosse Xuechen Li Atsushi Nitanda Taiji Suzuki Denny Wu Ji Xu 34 32 0 18 Jun 2020
Continual Learning with Extended Kronecker-factored Approximate Curvature Janghyeon Lee H. Hong Donggyu Joo Junmo Kim CLL 20 52 0 16 Apr 2020
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems Tianle Cai Ruiqi Gao Jikai Hou Siyu Chen Dong Wang Di He Zhihua Zhang Liwei Wang ODL 21 57 0 28 May 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise Yeming Wen Kevin Luk Maxime Gazeau Guodong Zhang Harris Chan Jimmy Ba ODL 20 22 0 21 Feb 2019
Fisher Information and Natural Gradient Learning of Random Deep Networks S. Amari Ryo Karakida Masafumi Oizumi 14 34 0 22 Aug 2018
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis Thomas George César Laurent Xavier Bouthillier Nicolas Ballas Pascal Vincent ODL 29 150 0 11 Jun 2018
Meta-Learning with Hessian-Free Approach in Deep Neural Nets Training Boyu Chen Wenlian Lu Ernest Fokoue 21 1 0 22 May 2018
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation Yuhuai Wu Elman Mansimov Shun Liao Roger C. Grosse Jimmy Ba OffRL 22 622 0 17 Aug 2017
Kronecker Recurrent Units C. Jose Moustapha Cissé F. Fleuret ODL 24 45 0 29 May 2017