ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.18820
  4. Cited By
Universal Checkpointing: Efficient and Flexible Checkpointing for Large
  Scale Distributed Training

Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training

27 June 2024
Xinyu Lian
Sam Ade Jacobs
Lev Kurilenko
Masahiro Tanaka
Stas Bekman
Olatunji Ruwase
Minjia Zhang
    OffRL
ArXivPDFHTML

Papers citing "Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training"

2 / 2 papers shown
Title
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
253
1,989
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1