ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.06268
  4. Cited By
Spectral-factorized Positive-definite Curvature Learning for NN Training

Spectral-factorized Positive-definite Curvature Learning for NN Training

10 February 2025
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
ArXivPDFHTML

Papers citing "Spectral-factorized Positive-definite Curvature Learning for NN Training"

39 / 39 papers shown
Title
Old Optimizer, New Norm: An Anthology
Old Optimizer, New Norm: An Anthology
Jeremy Bernstein
Laker Newhouse
ODL
82
17
0
30 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
118
34
0
17 Sep 2024
Efficient, Multimodal, and Derivative-Free Bayesian Inference With
  Fisher-Rao Gradient Flows
Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows
Yifan Chen
Daniel Zhengyu Huang
Jiaoyang Huang
Sebastian Reich
Andrew M. Stuart
66
5
0
25 Jun 2024
Variational Learning is Effective for Large Deep Networks
Variational Learning is Effective for Large Deep Networks
Yuesong Shen
Nico Daheim
Bai Cong
Peter Nickl
Gian Maria Marconi
...
Rio Yokota
Iryna Gurevych
Daniel Cremers
Mohammad Emtiyaz Khan
Thomas Möllenhoff
53
25
0
27 Feb 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
64
49
0
26 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A
  Second-Order Perspective
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard Turner
Alireza Makhzani
ODL
83
12
0
05 Feb 2024
A Distributed Data-Parallel PyTorch Implementation of the Distributed
  Shampoo Optimizer for Training Neural Networks At-Scale
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
46
25
0
12 Sep 2023
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Zeju Qiu
Wei-yu Liu
Haiwen Feng
Yuxuan Xue
Yao Feng
Zhen Liu
Dan Zhang
Adrian Weller
Bernhard Schölkopf
DiffM
66
148
0
12 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
68
171
0
01 Jun 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
  Pre-training
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
66
139
0
23 May 2023
Simplifying Momentum-based Positive-definite Submanifold Optimization
  with Applications to Deep Learning
Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning
Wu Lin
Valentin Duruisseaux
Melvin Leok
Frank Nielsen
Mohammad Emtiyaz Khan
Mark Schmidt
64
7
0
20 Feb 2023
Symbolic Discovery of Optimization Algorithms
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
110
367
0
13 Feb 2023
Invariance Properties of the Natural Gradient in Overparametrised
  Systems
Invariance Properties of the Natural Gradient in Overparametrised Systems
Jesse van Oostrum
J. Müller
Nihat Ay
30
9
0
30 Jun 2022
Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal
  Attention, and Optimal Transport
Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport
Lingkai Kong
Yuqing Wang
Molei Tao
ODL
51
9
0
27 May 2022
Better plain ViT baselines for ImageNet-1k
Better plain ViT baselines for ImageNet-1k
Lucas Beyer
Xiaohua Zhai
Alexander Kolesnikov
ViT
VLM
35
116
0
03 May 2022
Analytic natural gradient updates for Cholesky factor in Gaussian
  variational approximation
Analytic natural gradient updates for Cholesky factor in Gaussian variational approximation
Linda S. L. Tan
39
11
0
01 Sep 2021
The Bayesian Learning Rule
The Bayesian Learning Rule
Mohammad Emtiyaz Khan
Håvard Rue
BDL
75
73
0
09 Jul 2021
Tensor Normal Training for Deep Learning Models
Tensor Normal Training for Deep Learning Models
Yi Ren
D. Goldfarb
41
27
0
05 Jun 2021
On Riemannian Optimization over Positive Definite Matrices with the
  Bures-Wasserstein Geometry
On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry
Andi Han
Bamdev Mishra
Pratik Jawanpuria
Junbin Gao
56
38
0
01 Jun 2021
Tractable structured natural gradient descent using local
  parameterizations
Tractable structured natural gradient descent using local parameterizations
Wu Lin
Frank Nielsen
Mohammad Emtiyaz Khan
Mark Schmidt
44
29
0
15 Feb 2021
Orthogonal Over-Parameterized Training
Orthogonal Over-Parameterized Training
Weiyang Liu
Rongmei Lin
Zhen Liu
James M. Rehg
Liam Paull
Li Xiong
Le Song
Adrian Weller
55
41
0
09 Apr 2020
Handling the Positive-Definite Constraint in the Bayesian Learning Rule
Handling the Positive-Definite Constraint in the Bayesian Learning Rule
Wu Lin
Mark Schmidt
Mohammad Emtiyaz Khan
BDL
49
35
0
24 Feb 2020
Momentum Improves Normalized SGD
Momentum Improves Normalized SGD
Ashok Cutkosky
Harsh Mehta
ODL
57
122
0
09 Feb 2020
Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley
  Transform
Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform
Jun Li
Fuxin Li
S. Todorovic
38
104
0
04 Feb 2020
Optimizing Millions of Hyperparameters by Implicit Differentiation
Optimizing Millions of Hyperparameters by Implicit Differentiation
Jonathan Lorraine
Paul Vicol
David Duvenaud
DD
104
409
0
06 Nov 2019
On Empirical Comparisons of Optimizers for Deep Learning
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
62
259
0
11 Oct 2019
Variational Bayes on Manifolds
Variational Bayes on Manifolds
Minh-Ngoc Tran
D. Nguyen
Duy Nguyen
45
23
0
08 Aug 2019
Fast and Simple Natural-Gradient Variational Inference with Mixture of
  Exponential-family Approximations
Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations
Wu Lin
Mohammad Emtiyaz Khan
Mark Schmidt
BDL
31
69
0
07 Jun 2019
Practical Deep Learning with Bayesian Principles
Practical Deep Learning with Bayesian Principles
Kazuki Osawa
S. Swaroop
Anirudh Jain
Runa Eschenhagen
Richard Turner
Rio Yokota
Mohammad Emtiyaz Khan
BDL
UQCV
72
243
0
06 Jun 2019
CutMix: Regularization Strategy to Train Strong Classifiers with
  Localizable Features
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
581
4,735
0
13 May 2019
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
63
192
0
18 Jun 2018
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Mohammad Emtiyaz Khan
Didrik Nielsen
Voot Tangkaratt
Wu Lin
Y. Gal
Akash Srivastava
ODL
87
269
0
13 Jun 2018
Natural Gradients in Practice: Non-Conjugate Variational Inference in
  Gaussian Process Models
Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models
Hugh Salimbeni
Stefanos Eleftheriadis
J. Hensman
BDL
41
85
0
24 Mar 2018
Noisy Natural Gradient as Variational Inference
Noisy Natural Gradient as Variational Inference
Guodong Zhang
Shengyang Sun
David Duvenaud
Roger C. Grosse
ODL
59
211
0
06 Dec 2017
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
243
9,687
0
25 Oct 2017
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
136
1,779
0
10 Oct 2017
Preconditioned Stochastic Gradient Descent
Preconditioned Stochastic Gradient Descent
Xi-Lin Li
20
93
0
14 Dec 2015
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
James Martens
Roger C. Grosse
ODL
69
999
0
19 Mar 2015
Manopt, a Matlab toolbox for optimization on manifolds
Manopt, a Matlab toolbox for optimization on manifolds
Nicolas Boumal
Bamdev Mishra
P.-A. Absil
R. Sepulchre
82
1,023
0
23 Aug 2013
1