98
11

GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy

Abstract

Graph neural networks (GNNs) have been demonstrated as a powerful tool for analysing non-Euclidean graph data. However, the lack of efficient distributed graph learning (GL) systems severely hinders applications of GNNs, especially when graphs are big and GNNs are relatively deep. Herein, we present GraphTheta, a novel distributed and scalable GL system implemented in vertex-centric graph programming model. GraphTheta is the first GL system built upon distributed graph processing with neural network operators implemented as user-defined functions. This system supports multiple training strategies, and enables efficient and scalable big graph learning on distributed (virtual) machines with low memory each. To facilitate graph convolution implementations, GraphTheta puts forward a new GL abstraction named NN-TGAR to bridge the gap between graph processing and graph deep learning. A distributed graph engine is proposed to conduct the stochastic gradient descent optimization with a hybrid-parallel execution. Moreover, we add support for a new cluster-batched training strategy besides global-batch and mini-batch. We evaluate GraphTheta using a number of datasets with network size ranging from small-, modest- to large-scale. Experimental results show that GraphTheta can scale well to 1,024 workers for training an in-house developed GNN on an industry-scale Alipay dataset of 1.4 billion nodes and 4.1 billion attributed edges, with a cluster of CPU virtual machines (dockers) of small memory each (5\sim12GB). Moreover, GraphTheta obtains comparable or better prediction results than the state-of-the-art GNN implementations, demonstrating its capability of learning GNNs as well as existing frameworks, and can outperform DistDGL by up to 2.02×2.02\times with better scalability. To the best of our knowledge, this work presents the largest edge-attributed GNN learning task conducted in the literature.

View on arXiv
Comments on this paper