Scalable Graph Generative Modeling via Substructure Sequences

22 May 2025

Main:9 Pages

7 Figures

Bibliography:5 Pages

9 Tables

Appendix:2 Pages

Abstract

Graph neural networks (GNNs) has been predominantly driven by message-passing, where node representations are iteratively updated via local neighborhood aggregation. Despite their success, message-passing suffers from fundamental limitations -- including constrained expressiveness, over-smoothing, over-squashing, and limited capacity to model long-range dependencies. These issues hinder scalability: increasing data size or model size often fails to yield improved performance, limiting the viability of GNNs as backbones for graph foundation models. In this work, we explore pathways beyond message-passing and introduce Generative Graph Pattern Machine (G $^2$ PM), a generative Transformer pre-training framework for graphs. G $^2$ PM represents graph instances (nodes, edges, or entire graphs) as sequences of substructures, and employs generative pre-training over the sequences to learn generalizable, transferable representations. Empirically, G $^2$ PM demonstrates strong scalability: on the ogbn-arxiv benchmark, it continues to improve with model sizes up to 60M parameters, outperforming prior generative approaches that plateau at significantly smaller scales (e.g., 3M). In addition, we systematically analyze the model design space, highlighting key architectural choices that contribute to its scalability and generalization. Across diverse tasks -- including node classification, graph classification, and transfer learning -- G $^2$ PM consistently outperforms strong baselines, establishing a compelling foundation for scalable graph learning. The code and dataset are available atthis https URL.

View on arXiv

@article{wang2025_2505.16130,
  title={ Scalable Graph Generative Modeling via Substructure Sequences },
  author={ Zehong Wang and Zheyuan Zhang and Tianyi Ma and Chuxu Zhang and Yanfang Ye },
  journal={arXiv preprint arXiv:2505.16130},
  year={ 2025 }
}

Comments on this paper