ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.02971
13
17

Multinomial Logit Bandit with Linear Utility Functions

8 May 2018
Mingdong Ou
Nan Li
Shenghuo Zhu
Rong Jin
ArXivPDFHTML
Abstract

Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a KKK-cardinality subset from NNN candidate items, and receives a reward which is governed by a {\it multinomial logit} (MNL) choice model considering both item utility and substitution property among items. The player's objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon TTT. This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model separately and incur a regret no better than O~(NT)\tilde{O}\big(\sqrt{NT}\big)O~(NT​) which is not preferred for large candidate set size NNN. In this paper, we consider the {\it linear utility} MNL choice model whose item utilities are represented as linear functions of ddd-dimension item features, and propose an algorithm, titled {\bf LUMB}, to exploit the underlying structure. It is proven that the proposed algorithm achieves O~(dKT)\tilde{O}\big(dK\sqrt{T}\big)O~(dKT​) regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm.

View on arXiv
Comments on this paper