21
0

HSG-12M: A Large-Scale Spatial Multigraph Dataset

Abstract

Existing graph benchmarks assume non-spatial, simple edges, collapsing physically distinct paths into a single link. We introduce HSG-12M, the first large-scale dataset of spatial multigraphs\textbf{spatial multigraphs}-graphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. HSG-12M contains 11.6 million static and 5.1 million dynamic Hamiltonian spectral graphs\textit{Hamiltonian spectral graphs} across 1401 characteristic-polynomial classes, derived from 177 TB of spectral potential data. Each graph encodes the full geometry of a 1-D crystal's energy spectrum on the complex plane, producing diverse, physics-grounded topologies that transcend conventional node-coordinate datasets. To enable future extensions, we release Poly2Graph\texttt{Poly2Graph}: a high-performance, open-source pipeline that maps arbitrary 1-D crystal Hamiltonians to spectral graphs. Benchmarks with popular GNNs expose new challenges in learning from multi-edge geometry at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for geometry-aware graph learning and new opportunities of data-driven scientific discovery in condensed matter physics and beyond.

View on arXiv
@article{yan2025_2506.08618,
  title={ HSG-12M: A Large-Scale Spatial Multigraph Dataset },
  author={ Xianquan Yan and Hakan Akgün and Kenji Kawaguchi and N. Duane Loh and Ching Hua Lee },
  journal={arXiv preprint arXiv:2506.08618},
  year={ 2025 }
}
Main:8 Pages
15 Figures
Bibliography:7 Pages
5 Tables
Appendix:24 Pages
Comments on this paper