ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.11762
  4. Cited By
LIBRA: Enabling Workload-aware Multi-dimensional Network Topology
  Optimization for Distributed Training of Large AI Models

LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models

24 September 2021
William Won
Saeed Rashidi
Sudarshan Srinivasan
T. Krishna
    AI4CE
ArXivPDFHTML

Papers citing "LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models"

4 / 4 papers shown
Title
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems
  for Large-model Training at Scale
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
William Won
Taekyung Heo
Saeed Rashidi
Srinivas Sridharan
Sudarshan Srinivasan
T. Krishna
36
43
0
24 Mar 2023
Impact of RoCE Congestion Control Policies on Distributed Training of
  DNNs
Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Tarannum Khan
Saeed Rashidi
Srinivas Sridharan
Pallavi Shurpali
Aditya Akella
T. Krishna
OOD
28
11
0
22 Jul 2022
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
246
4,489
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1