Blind source separation (BSS) has proven to be a powerful and widely-applicable tool for the analysis and interpretation of composite patterns in engineering and science. Most BSS algorithms aim to find a matrix factorization of data under certain assumptions (e.g., source independence or solution sparsity) which may be invalid for real-world BSS problems. We introduce Convex Analysis of Mixtures (CAM) for separating non-negative well-grounded sources. Based on a geometrical latent variable model, CAM learns the mixing matrix by identifying the lateral edges of the convex data scatter plot. The algorithm is supported theoretically by a well-grounded mathematical framework and practically by plug-in noise filtering using sector-based clustering, an efficient convex analysis scheme, and stability-based model selection. We demonstrate the principle of CAM on simulated data and numerically mixed images. The superior performance of CAM against a panel of benchmark BSS techniques is demonstrated on numerically mixed real gene expression data. We then apply CAM to dissect dynamic contrast-enhanced magnetic resonance imaging data taken from breast cancer tumors, identifying vascular compartments with distinct pharmacokinetics and revealing characteristic intratumor vascular heterogeneity. We also apply CAM to time-course microarray gene expression data derived from in-vivo muscle regeneration in mice, observing a biologically plausible pathway decomposition that reveals the expected dynamics of relevant biological processes.
View on arXiv