ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.12019
  4. Cited By
Large-Scale Distributed Second-Order Optimization Using
  Kronecker-Factored Approximate Curvature for Deep Convolutional Neural
  Networks

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

29 November 2018
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
    ODL
ArXivPDFHTML

Papers citing "Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks"

24 / 24 papers shown
Title
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
Liming Liu
Zhenghao Xu
Zixuan Zhang
Hao Kang
Zichong Li
Chen Liang
Weizhu Chen
T. Zhao
146
1
0
24 Feb 2025
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Lukas Tatzel
Bálint Mucsányi
Osane Hackel
Philipp Hennig
43
0
0
18 Oct 2024
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
75
24
0
17 Sep 2024
An Improved Empirical Fisher Approximation for Natural Gradient Descent
An Improved Empirical Fisher Approximation for Natural Gradient Descent
Xiaodong Wu
Wenyi Yu
Chao Zhang
Philip Woodland
29
3
0
10 Jun 2024
A Differential Geometric View and Explainability of GNN on Evolving
  Graphs
A Differential Geometric View and Explainability of GNN on Evolving Graphs
Yazheng Liu
Xi Zhang
Sihong Xie
21
3
0
11 Mar 2024
Eva: A General Vectorized Approximation Framework for Second-order
  Optimization
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang
S. Shi
Bo-wen Li
28
1
0
04 Aug 2023
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Kazuki Osawa
Satoki Ishikawa
Rio Yokota
Shigang Li
Torsten Hoefler
ODL
38
14
0
08 May 2023
PipeFisher: Efficient Training of Large Language Models Using Pipelining
  and Fisher Information Matrices
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
35
24
0
25 Nov 2022
A New Perspective for Understanding Generalization Gap of Deep Neural
  Networks Trained with Large Batch Sizes
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
32
11
0
21 Oct 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
Severin Reiz
T. Neckel
H. Bungartz
ODL
31
1
0
03 Aug 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed
  Preconditioning
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
S. Shi
Wei Wang
Bo-wen Li
36
10
0
30 Jun 2022
Gradient Descent on Neurons and its Link to Approximate Second-Order
  Optimization
Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization
Frederik Benzing
ODL
43
23
0
28 Jan 2022
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
30
14
0
01 Nov 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal
  Matrix Factorizations
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
Grzegorz Kwa'sniewski
Marko Kabić
Tal Ben-Nun
A. Ziogas
Jens Eirik Saethre
...
Timo Schneider
Maciej Besta
Anton Kozhevnikov
J. VandeVondele
Torsten Hoefler
36
15
0
20 Aug 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and
  Communication Tasks
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
S. Shi
Lin Zhang
Bo-wen Li
40
9
0
14 Jul 2021
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
Elias Frantar
Eldar Kurtic
Dan Alistarh
13
57
0
07 Jul 2021
A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
Kai-Xin Gao
Xiaolei Liu
Zheng-Hai Huang
Min Wang
Zidong Wang
Dachuan Xu
F. Yu
24
11
0
21 Nov 2020
A block coordinate descent optimizer for classification problems
  exploiting convexity
A block coordinate descent optimizer for classification problems exploiting convexity
Ravi G. Patel
N. Trask
Mamikon A. Gulian
E. Cyr
ODL
30
7
0
17 Jun 2020
What Deep CNNs Benefit from Global Covariance Pooling: An Optimization
  Perspective
What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective
Qilong Wang
Li Zhang
Banggu Wu
Dongwei Ren
P. Li
W. Zuo
Q. Hu
14
21
0
25 Mar 2020
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
22
168
0
19 Dec 2019
Limitations of the Empirical Fisher Approximation for Natural Gradient
  Descent
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Frederik Kunstner
Lukas Balles
Philipp Hennig
21
207
0
29 May 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
30
72
0
27 Jan 2019
1