ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.14133
  4. Cited By
PipeFisher: Efficient Training of Large Language Models Using Pipelining
  and Fisher Information Matrices

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

25 November 2022
Kazuki Osawa
Shigang Li
Torsten Hoefler
    AI4CE
ArXivPDFHTML

Papers citing "PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices"

10 / 10 papers shown
Title
Influence Functions for Scalable Data Attribution in Diffusion Models
Influence Functions for Scalable Data Attribution in Diffusion Models
Bruno Mlodozeniec
Runa Eschenhagen
Juhan Bae
Alexander Immer
David Krueger
Richard E. Turner
DiffM
TDI
90
4
0
17 Oct 2024
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
64
1
0
01 Dec 2023
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
74
667
0
09 Apr 2021
Sharpness-Aware Minimization for Efficiently Improving Generalization
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
176
1,323
0
03 Oct 2020
Convolutional Neural Network Training with Distributed K-FAC
Convolutional Neural Network Training with Distributed K-FAC
J. G. Pauloski
Zhao Zhang
Lei Huang
Weijia Xu
Ian Foster
43
31
0
01 Jul 2020
Rigging the Lottery: Making All Tickets Winners
Rigging the Lottery: Making All Tickets Winners
Utku Evci
Trevor Gale
Jacob Menick
Pablo Samuel Castro
Erich Elsen
154
592
0
25 Nov 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
186
991
0
01 Apr 2019
Shampoo: Preconditioned Stochastic Tensor Optimization
Shampoo: Preconditioned Stochastic Tensor Optimization
Vineet Gupta
Tomer Koren
Y. Singer
ODL
68
214
0
26 Feb 2018
Overcoming catastrophic forgetting in neural networks
Overcoming catastrophic forgetting in neural networks
J. Kirkpatrick
Razvan Pascanu
Neil C. Rabinowitz
J. Veness
Guillaume Desjardins
...
A. Grabska-Barwinska
Demis Hassabis
Claudia Clopath
D. Kumaran
R. Hadsell
CLL
303
7,410
0
02 Dec 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
382
2,922
0
15 Sep 2016
1