Machine learning approach for identification of release sources in advection-diffusion systems

These records are then used to estimate properties of the contaminant sources, e.g., locations, release strengths and model parameters representing contaminant migration (e.g., velocity, dispersivity, etc.). These estimates are essential for a reliable assessment of the contamination hazards and risks. If there are more than one contaminant sources (with different locations and strengths), the observed records represent contaminant mixtures; typically, the number of sources is unknown. The mixing ratios of the different contaminant sources at the detectors are also unknown; this further hinders the reliability and complexity of the inverse-model analyses. To circumvent some of these challenges, we have developed a novel hybrid source identification method coupling machine learning and inverse-analysis methods, and called Green-NMFk. It performs decomposition of the observed mixtures based on Non-negative Matrix Factorization method for Blind Source Separation, coupled with custom semi-supervised clustering algorithm, and uses Green's functions of advection-diffusion equation. Our method is capable of identifying the unknown number, locations, and properties of a set of contaminant sources from measured contaminant-source mixtures with unknown mixing ratios, without any additional information. It also estimates the contaminant transport properties, such as velocity and dispersivity. Green-NMFk is not limited to contaminant transport but can be applied directly to any problem controlled by partial-differential parabolic equation where mixtures of an unknown number of physical sources are monitored at multiple locations. Green-NMFk can be also applied with different Green's functions; for example, representing anomalous (non-Fickian) dispersion or wave propagation in dispersive media.
View on arXiv