It was shown recently that the L1-norm principal components (L1-PCs) of a real-valued data matrix ( data samples of dimensions) can be exactly calculated with cost or, when advantageous, where , [1],[2]. In applications where is large (e.g., "big" data of large and/or "heavy" data of large ), these costs are prohibitive. In this work, we present a novel suboptimal algorithm for the calculation of the L1-PCs of of cost , which is comparable to that of standard (L2-norm) PC analysis. Our theoretical and experimental studies show that the proposed algorithm calculates the exact optimal L1-PCs with high frequency and achieves higher value in the L1-PC optimization metric than any known alternative algorithm of comparable computational cost. The superiority of the calculated L1-PCs over standard L2-PCs (singular vectors) in characterizing potentially faulty data/measurements is demonstrated with experiments on data dimensionality reduction and disease diagnosis from genomic data.
View on arXiv