We demonstrate the importance of structural priors for effective, efficient large-scale dynamic assortment personalization. Assortment personalization is the problem of choosing, for each individual or consumer segment (type), a best assortment of products, ads, or other offerings (items) so as to maximize revenue. This problem is central to revenue management in e-commerce, online advertising, and multi-location brick-and-mortar retail, where both items and types can number in the thousands-to-millions. Data efficiency is paramount in this large-scale setting. A good personalization strategy must dynamically balance the need to learn consumer preferences and to maximize revenue. We formulate the dynamic assortment personalization problem as a discrete-contextual bandit with contexts (customer types) and many arms (assortments of the items). We assume that each type's preferences follow a simple parametric model with parameters. In all, there are parameters, and existing literature suggests that order optimal regret scales as . However, this figure is orders of magnitude larger than the data available in large-scale applications, and imposes unacceptably high regret. In this paper, we impose natural structure on the problem -- a small latent dimension, or low rank. In the static setting, we show that this model can be efficiently learned from surprisingly few interactions, using a time- and memory-efficient optimization algorithm that converges globally whenever the model is learnable. In the dynamic setting, we show that structure-aware dynamic assortment personalization can have regret that is an order of magnitude smaller than structure-ignorant approaches. We validate our theoretical results empirically.
View on arXiv