ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.00247
44
2

TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation

1 February 2023
Ziji Shi
Le Jiang
Ang Wang
Jie Zhang
Xianyan Jia
Yong Li
Chencan Wu
Jialin Li
Wei Lin
    GNN
ArXivPDFHTML
Abstract

Model parallelism has become necessary to train large neural networks. However, finding a suitable model parallel schedule for an arbitrary neural network is a non-trivial task due to the exploding search space. In this work, we present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules. Leveraging the key insight that a neural network can be represented as a directed acyclic graph, within which may only exist a limited set of frequent subgraphs, we design a graph pruning algorithm to fold the search space efficiently. TAP runs at sub-linear complexity concerning the neural network size. Experiments show that TAP is 20×−160×20\times- 160\times20×−160× faster than the state-of-the-art automatic parallelism framework, and the performance of its discovered schedules is competitive with the expert-engineered ones.

View on arXiv
Comments on this paper