Efficient Parallelization Layouts for Large-Scale Distributed Model Training

9 November 2023

Papers citing "Efficient Parallelization Layouts for Large-Scale Distributed Model Training"

4 / 4 papers shown

Title
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training Jared Fernandez Luca Wehrstedt Leonid Shamis Mostafa Elhoushi Kalyan Saladi Yonatan Bisk Emma Strubell Jacob Kahn 248 3 0 20 Nov 2024
Stable LM 2 1.6B Technical Report Marco Bellagente J. Tow Dakota Mahan Duy Phung Maksym Zhuravinskyi ... Paulo Rocha Harry Saini H. Teufel Niccoló Zanichelli Carlos Riquelme OSLM 49 52 0 27 Feb 2024
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 698 0 27 Aug 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,826 0 17 Sep 2019