A Comparative Analysis of Distributed Training Strategies for GPT-2

24 May 2024

Papers citing "A Comparative Analysis of Distributed Training Strategies for GPT-2"

1 / 1 papers shown

Title
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019