Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.14133
Cited By
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
25 November 2022
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices"
10 / 10 papers shown
Title
Influence Functions for Scalable Data Attribution in Diffusion Models
Bruno Mlodozeniec
Runa Eschenhagen
Juhan Bae
Alexander Immer
David Krueger
Richard E. Turner
DiffM
TDI
90
4
0
17 Oct 2024
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
64
1
0
01 Dec 2023
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
74
667
0
09 Apr 2021
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
176
1,323
0
03 Oct 2020
Convolutional Neural Network Training with Distributed K-FAC
J. G. Pauloski
Zhao Zhang
Lei Huang
Weijia Xu
Ian Foster
43
31
0
01 Jul 2020
Rigging the Lottery: Making All Tickets Winners
Utku Evci
Trevor Gale
Jacob Menick
Pablo Samuel Castro
Erich Elsen
154
592
0
25 Nov 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
186
991
0
01 Apr 2019
Shampoo: Preconditioned Stochastic Tensor Optimization
Vineet Gupta
Tomer Koren
Y. Singer
ODL
68
214
0
26 Feb 2018
Overcoming catastrophic forgetting in neural networks
J. Kirkpatrick
Razvan Pascanu
Neil C. Rabinowitz
J. Veness
Guillaume Desjardins
...
A. Grabska-Barwinska
Demis Hassabis
Claudia Clopath
D. Kumaran
R. Hadsell
CLL
303
7,410
0
02 Dec 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
382
2,922
0
15 Sep 2016
1