Building recommendation algorithms is one of the most challenging tasks in Machine Learning. Although most of the recommendation systems are built on explicit feedback available from the users in terms of rating or text, a majority of the applications do not receive such feedback. Here we consider the recommendation task where the only available data is the records of user-item interaction over web applications over time, in terms of subscription or purchase of items; this is known as implicit feedback recommendation. It is very common to draw recommendation from such datasets using Probabilistic Latent Semantic Indexing (PLSI). However, PLSI relies on EM algorithm and suffers from local maxima problem. Also, for any web application, there is massive amount of user-item interaction data available stored across distributed frameworks. Algorithms like PLSI or Matrix Factorization runs several iterations through the dataset, and may not be suitable for large web scale dataset. Here we propose a solution for PLSI using Method of Moments, which unlike EM algorithm does not suffer from local maxima, and provides significant improvement in performance over the standard EM based solution. Further, we show how to scale up the algorithm using a stochastic whitening step. This results in a highly scalable algorithm that scales up to million of users even on a machine with a single-core processor and 8 GB RAM, and produces competitive performance in comparison with existing algorithms.
View on arXiv