38

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Mirko Degli Esposti
Main:18 Pages
6 Figures
Bibliography:2 Pages
14 Tables
Appendix:6 Pages
Abstract

Maximum entropy (MaxEnt) modelling provides a principled framework for generating synthetic populations from aggregate census data, without access to individual-level microdata. The bottleneck of existing approaches is exact expectation computation, which requires summing over the full tuple space \cX\cX and becomes infeasible for more than K20K \approx 20 categorical attributes. We propose \emph{GibbsPCDSolver}, a stochastic replacement for this computation based on Persistent Contrastive Divergence (PCD): a persistent pool of NN synthetic individuals is updated by Gibbs sweeps at each gradient step, providing a stochastic approximation of the model expectations without ever materialising \cX\cX. We validate the approach on controlled benchmarks and on \emph{Syn-ISTAT}, a K=15K{=}15 Italian demographic benchmark with analytically exact marginal targets derived from ISTAT-inspired conditional probability tables. Scaling experiments across K{12,20,30,40,50}K \in \{12, 20, 30, 40, 50\} confirm that GibbsPCDSolver maintains \MRE[0.010,0.018]\MRE \in [0.010, 0.018] while \cX|\cX| grows eighteen orders of magnitude, with runtime scaling as O(K)O(K) rather than O(\cX)O(|\cX|). On Syn-ISTAT, GibbsPCDSolver reaches \MRE=0.03\MRE{=}0.03 on training constraints and -- crucially -- produces populations with effective sample size \Neff=N\Neff = N versus \Neff0.012N\Neff \approx 0.012\,N for generalised raking, an 86.8×86.8{\times} diversity advantage that is essential for agent-based urban simulations.

View on arXiv
Comments on this paper