FedGCN: Convergence and Communication Tradeoffs in Federated Training of
Graph Convolutional Networks
- GNNFedML
Methods for training models on graphs distributed across multiple clients have recently grown in popularity, due to the size of these graphs as well as regulations on keeping data where it is generated, like GDPR in the EU. However, a single connected graph cannot be disjointly partitioned onto multiple distributed clients due to the cross-client edges connecting graph nodes. Thus, distributed methods for training a model on a single graph incur either significant communication overhead between clients or a loss of available information to the training. We introduce the Federated Graph Convolutional Network (FedGCN) algorithm, which uses federated learning to train GCN models for semi-supervised node classification on large graphs with fast convergence and little communication. Compared to prior methods that require communication among clients at each training round, FedGCN clients only communicate with the central server in one pre-training step, greatly reducing communication costs. We theoretically analyze the tradeoff between FedGCN's convergence rate and communication cost under different data distributions and introduce a general framework that can be used for analysis of all edge-completion-based GCN training algorithms. Experimental results show that our FedGCN algorithm achieves 51.7% faster convergence on average and at least 100X less communication cost compared to prior work.
View on arXiv