Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence
- SyDa
Maximum entropy (MaxEnt) modelling provides a principled framework for generating synthetic populations from aggregate census data, without access to individual-level microdata. The bottleneck of existing approaches is exact expectation computation, which requires summing over the full tuple space and becomes infeasible for more than categorical attributes. We propose \emph{GibbsPCDSolver}, a stochastic replacement for this computation based on Persistent Contrastive Divergence (PCD): a persistent pool of synthetic individuals is updated by Gibbs sweeps at each gradient step, providing a stochastic approximation of the model expectations without ever materialising . We validate the approach on controlled benchmarks and on \emph{Syn-ISTAT}, a Italian demographic benchmark with analytically exact marginal targets derived from ISTAT-inspired conditional probability tables. Scaling experiments across confirm that GibbsPCDSolver maintains while grows eighteen orders of magnitude, with runtime scaling as rather than . On Syn-ISTAT, GibbsPCDSolver reaches on training constraints and -- crucially -- produces populations with effective sample size versus for generalised raking, an diversity advantage that is essential for agent-based urban simulations.
View on arXiv