Spectral-factorized Positive-definite Curvature Learning for NN Training

10 February 2025

Papers citing "Spectral-factorized Positive-definite Curvature Learning for NN Training"

39 / 39 papers shown

Title
Old Optimizer, New Norm: An Anthology Jeremy Bernstein Laker Newhouse ODL 82 17 0 30 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam Nikhil Vyas Depen Morwani Rosie Zhao Itai Shapira David Brandfonbrener Lucas Janson Sham Kakade Sham Kakade 118 34 0 17 Sep 2024
Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows Yifan Chen Daniel Zhengyu Huang Jiaoyang Huang Sebastian Reich Andrew M. Stuart 66 5 0 25 Jun 2024
Variational Learning is Effective for Large Deep Networks Yuesong Shen Nico Daheim Bai Cong Peter Nickl Gian Maria Marconi ... Rio Yokota Iryna Gurevych Daniel Cremers Mohammad Emtiyaz Khan Thomas Möllenhoff 53 25 0 27 Feb 2024
Why Transformers Need Adam: A Hessian Perspective Yushun Zhang Congliang Chen Tian Ding Ziniu Li Ruoyu Sun Zhimin Luo 64 49 0 26 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective Wu Lin Felix Dangel Runa Eschenhagen Juhan Bae Richard Turner Alireza Makhzani ODL 83 12 0 05 Feb 2024
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale Hao-Jun Michael Shi Tsung-Hsien Lee Shintaro Iwasaki Jose Gallego-Posada Zhijing Li Kaushik Rangadurai Dheevatsa Mudigere Michael Rabbat ODL 46 25 0 12 Sep 2023
Controlling Text-to-Image Diffusion by Orthogonal Finetuning Zeju Qiu Wei-yu Liu Haiwen Feng Yuxuan Xue Yao Feng Zhen Liu Dan Zhang Adrian Weller Bernhard Schölkopf DiffM 66 148 0 12 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles Chaitanya K. Ryali Yuan-Ting Hu Daniel Bolya Chen Wei Haoqi Fan ... Omid Poursaeed Judy Hoffman Jitendra Malik Yanghao Li Christoph Feichtenhofer 3DH 68 171 0 01 Jun 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Hong Liu Zhiyuan Li David Leo Wright Hall Percy Liang Tengyu Ma VLM 66 139 0 23 May 2023
Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning Wu Lin Valentin Duruisseaux Melvin Leok Frank Nielsen Mohammad Emtiyaz Khan Mark Schmidt 64 7 0 20 Feb 2023
Symbolic Discovery of Optimization Algorithms Xiangning Chen Chen Liang Da Huang Esteban Real Kaiyuan Wang ... Xuanyi Dong Thang Luong Cho-Jui Hsieh Yifeng Lu Quoc V. Le 110 367 0 13 Feb 2023
Invariance Properties of the Natural Gradient in Overparametrised Systems Jesse van Oostrum J. Müller Nihat Ay 30 9 0 30 Jun 2022
Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport Lingkai Kong Yuqing Wang Molei Tao ODL 51 9 0 27 May 2022
Better plain ViT baselines for ImageNet-1k Lucas Beyer Xiaohua Zhai Alexander Kolesnikov ViT VLM 35 116 0 03 May 2022
Analytic natural gradient updates for Cholesky factor in Gaussian variational approximation Linda S. L. Tan 39 11 0 01 Sep 2021
The Bayesian Learning Rule Mohammad Emtiyaz Khan Håvard Rue BDL 75 73 0 09 Jul 2021
Tensor Normal Training for Deep Learning Models Yi Ren D. Goldfarb 41 27 0 05 Jun 2021
On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry Andi Han Bamdev Mishra Pratik Jawanpuria Junbin Gao 56 38 0 01 Jun 2021
Tractable structured natural gradient descent using local parameterizations Wu Lin Frank Nielsen Mohammad Emtiyaz Khan Mark Schmidt 44 29 0 15 Feb 2021
Orthogonal Over-Parameterized Training Weiyang Liu Rongmei Lin Zhen Liu James M. Rehg Liam Paull Li Xiong Le Song Adrian Weller 55 41 0 09 Apr 2020
Handling the Positive-Definite Constraint in the Bayesian Learning Rule Wu Lin Mark Schmidt Mohammad Emtiyaz Khan BDL 49 35 0 24 Feb 2020
Momentum Improves Normalized SGD Ashok Cutkosky Harsh Mehta ODL 57 122 0 09 Feb 2020
Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform Jun Li Fuxin Li S. Todorovic 38 104 0 04 Feb 2020
Optimizing Millions of Hyperparameters by Implicit Differentiation Jonathan Lorraine Paul Vicol David Duvenaud DD 104 409 0 06 Nov 2019
On Empirical Comparisons of Optimizers for Deep Learning Dami Choi Christopher J. Shallue Zachary Nado Jaehoon Lee Chris J. Maddison George E. Dahl 62 259 0 11 Oct 2019
Variational Bayes on Manifolds Minh-Ngoc Tran D. Nguyen Duy Nguyen 45 23 0 08 Aug 2019
Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations Wu Lin Mohammad Emtiyaz Khan Mark Schmidt BDL 31 69 0 07 Jun 2019
Practical Deep Learning with Bayesian Principles Kazuki Osawa S. Swaroop Anirudh Jain Runa Eschenhagen Richard Turner Rio Yokota Mohammad Emtiyaz Khan BDL UQCV 72 243 0 06 Jun 2019
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features Sangdoo Yun Dongyoon Han Seong Joon Oh Sanghyuk Chun Junsuk Choe Y. Yoo OOD 581 4,735 0 13 May 2019
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks Jinghui Chen Dongruo Zhou Yiqi Tang Ziyan Yang Yuan Cao Quanquan Gu ODL 63 192 0 18 Jun 2018
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam Mohammad Emtiyaz Khan Didrik Nielsen Voot Tangkaratt Wu Lin Y. Gal Akash Srivastava ODL 87 269 0 13 Jun 2018
Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models Hugh Salimbeni Stefanos Eleftheriadis J. Hensman BDL 41 85 0 24 Mar 2018
Noisy Natural Gradient as Variational Inference Guodong Zhang Shengyang Sun David Duvenaud Roger C. Grosse ODL 59 211 0 06 Dec 2017
mixup: Beyond Empirical Risk Minimization Hongyi Zhang Moustapha Cissé Yann N. Dauphin David Lopez-Paz NoLa 243 9,687 0 25 Oct 2017
Mixed Precision Training Paulius Micikevicius Sharan Narang Jonah Alben G. Diamos Erich Elsen ... Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh Hao Wu 136 1,779 0 10 Oct 2017
Preconditioned Stochastic Gradient Descent Xi-Lin Li 20 93 0 14 Dec 2015
Optimizing Neural Networks with Kronecker-factored Approximate Curvature James Martens Roger C. Grosse ODL 69 999 0 19 Mar 2015
Manopt, a Matlab toolbox for optimization on manifolds Nicolas Boumal Bamdev Mishra P.-A. Absil R. Sepulchre 82 1,023 0 23 Aug 2013