ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16397
  4. Cited By
AdaFisher: Adaptive Second Order Optimization via Fisher Information

AdaFisher: Adaptive Second Order Optimization via Fisher Information

26 May 2024
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
    ODL
ArXivPDFHTML

Papers citing "AdaFisher: Adaptive Second Order Optimization via Fisher Information"

50 / 56 papers shown
Title
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODL
AAML
412
0
0
25 Feb 2025
Improving Generalization and Convergence by Enhancing Implicit
  Regularization
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
68
8
0
31 May 2024
Scalable Continuous-time Diffusion Framework for Network Inference and
  Influence Estimation
Scalable Continuous-time Diffusion Framework for Network Inference and Influence Estimation
Keke Huang
Ruize Gao
Bogdan Cautis
Xiaokui Xiao
35
3
0
05 Mar 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
77
53
0
26 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A
  Second-Order Perspective
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard Turner
Alireza Makhzani
ODL
98
13
0
05 Feb 2024
Kronecker-Factored Approximate Curvature for Modern Neural Network
  Architectures
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
Runa Eschenhagen
Alexander Immer
Richard Turner
Frank Schneider
Philipp Hennig
112
23
0
01 Nov 2023
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient
  Pre-LN Transformers
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
Zixuan Jiang
Jiaqi Gu
Hanqing Zhu
David Z. Pan
AI4CE
59
17
0
24 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
  Pre-training
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
70
143
0
23 May 2023
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Kazuki Osawa
Satoki Ishikawa
Rio Yokota
Shigang Li
Torsten Hoefler
ODL
58
15
0
08 May 2023
Symbolic Discovery of Optimization Algorithms
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
136
373
0
13 Feb 2023
A survey of deep learning optimizers -- first and second order methods
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
61
7
0
28 Nov 2022
How Does Adaptive Optimization Impact Local Neural Network Geometry?
How Does Adaptive Optimization Impact Local Neural Network Geometry?
Kaiqi Jiang
Dhruv Malik
Yuanzhi Li
89
18
0
04 Nov 2022
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient
  Classification
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
Muhammad N. ElNokrashy
Badr AlKhamissi
Mona T. Diab
MoMe
49
4
0
30 Sep 2022
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer
Valentin Leplat
D. Merkulov
Aleksandr Katrutsa
Daniel Bershatsky
Olga Tsymboi
Ivan Oseledets
78
3
0
29 Sep 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
111
45
0
20 May 2022
Focal Modulation Networks
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
69
271
0
22 Mar 2022
Gradient Descent on Neurons and its Link to Approximate Second-Order
  Optimization
Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization
Frederik Benzing
ODL
83
25
0
28 Jan 2022
ResNet strikes back: An improved training procedure in timm
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
242
492
0
01 Oct 2021
AdaInject: Injection Based Adaptive Gradient Descent Optimizers for
  Convolutional Neural Networks
AdaInject: Injection Based Adaptive Gradient Descent Optimizers for Convolutional Neural Networks
S. Dubey
S. H. Shabbeer Basha
S. Singh
B. B. Chaudhuri
ODL
66
9
0
26 Sep 2021
Escaping the Big Data Paradigm with Compact Transformers
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
110
462
0
12 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
398
21,281
0
25 Mar 2021
SWAD: Domain Generalization by Seeking Flat Minima
SWAD: Domain Generalization by Seeking Flat Minima
Junbum Cha
Sanghyuk Chun
Kyungjae Lee
Han-Cheol Cho
Seunghyun Park
Yunsung Lee
Sungrae Park
MoMe
275
449
0
17 Feb 2021
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed
  Gradients
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Juntang Zhuang
Tommy M. Tang
Yifan Ding
S. Tatikonda
Nicha Dvornek
X. Papademetris
James S. Duncan
ODL
135
510
0
15 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
184
1,342
0
03 Oct 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and
  Application
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
Lei Huang
Jie Qin
Yi Zhou
Fan Zhu
Li Liu
Ling Shao
AI4CE
93
267
0
27 Sep 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
91
281
0
01 Jun 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
354
42,299
0
03 Dec 2019
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
220
1,900
0
08 Aug 2019
Limitations of the Empirical Fisher Approximation for Natural Gradient
  Descent
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Frederik Kunstner
Lukas Balles
Philipp Hennig
68
215
0
29 May 2019
Searching for MobileNetV3
Searching for MobileNetV3
Andrew G. Howard
Mark Sandler
Grace Chu
Liang-Chieh Chen
Bo Chen
...
Yukun Zhu
Ruoming Pang
Vijay Vasudevan
Quoc V. Le
Hartwig Adam
317
6,737
0
06 May 2019
On the Convergence of Adam and Beyond
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
85
2,494
0
19 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
204
993
0
01 Apr 2019
Inefficiency of K-FAC for Large Batch Size Training
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
49
24
0
14 Mar 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Liangchen Luo
Yuanhao Xiong
Yan Liu
Xu Sun
ODL
74
602
0
26 Feb 2019
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
Ryo Takahashi
Takashi Matsubara
K. Uehara
55
329
0
22 Nov 2018
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex
  Optimization
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Xiangyi Chen
Sijia Liu
Ruoyu Sun
Mingyi Hong
53
323
0
08 Aug 2018
Generalized Cross Entropy Loss for Training Deep Neural Networks with
  Noisy Labels
Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
Zhilu Zhang
M. Sabuncu
NoLa
76
2,595
0
20 May 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
69
1,043
0
11 Apr 2018
Shampoo: Preconditioned Stochastic Tensor Optimization
Shampoo: Preconditioned Stochastic Tensor Optimization
Vineet Gupta
Tomer Koren
Y. Singer
ODL
70
219
0
26 Feb 2018
Improved Regularization of Convolutional Neural Networks with Cutout
Improved Regularization of Convolutional Neural Networks with Cutout
Terrance Devries
Graham W. Taylor
107
3,758
0
15 Aug 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
125
848
0
13 Aug 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
120
3,675
0
08 Jun 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
56
1,028
0
23 May 2017
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
260
2,842
0
26 Sep 2016
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Sebastian Ruder
ODL
189
6,179
0
15 Sep 2016
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
711
36,708
0
25 Aug 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
288
8,091
0
13 Aug 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
334
10,467
0
21 Jul 2016
A Kronecker-factored approximate Fisher matrix for convolution layers
A Kronecker-factored approximate Fisher matrix for convolution layers
Roger C. Grosse
James Martens
ODL
97
261
0
03 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.9K
193,426
0
10 Dec 2015
12
Next