ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.08690
22
0

Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

16 August 2024
Tejas Pagare
Avishek Ghosh
ArXivPDFHTML
Abstract

Online learning in a decentralized two-sided matching markets, where the demand-side (players) compete to match with the supply-side (arms), has received substantial interest because it abstracts out the complex interactions in matching platforms (e.g. UpWork, TaskRabbit). However, past works assume that each arm knows their preference ranking over the players (one-sided learning), and each player aim to learn the preference over arms through successive interactions. Moreover, several (impractical) assumptions on the problem are usually made for theoretical tractability such as broadcast player-arm match Liu et al. (2020; 2021); Kong & Li (2023) or serial dictatorship Sankararaman et al. (2021); Basu et al. (2021); Ghosh et al. (2022). In this paper, we study a decentralized two-sided matching market, where we do not assume that the preference ranking over players are known to the arms apriori. Furthermore, we do not have any structural assumptions on the problem. We propose a multi-phase explore-then-commit type algorithm namely epoch-based CA-ETC (collision avoidance explore then commit) (\texttt{CA-ETC} in short) for this problem that does not require any communication across agents (players and arms) and hence decentralized. We show that for the initial epoch length of T∘T_{\circ}T∘​ and subsequent epoch-lengths of 2l/γT∘2^{l/\gamma} T_{\circ}2l/γT∘​ (for the l−l-l−th epoch with γ∈(0,1)\gamma \in (0,1)γ∈(0,1) as an input parameter to the algorithm), \texttt{CA-ETC} yields a player optimal expected regret of O(T∘(Klog⁡TT∘Δ2)1/γ+T∘(TT∘)γ)\mathcal{O}\left(T_{\circ} (\frac{K \log T}{T_{\circ} \Delta^2})^{1/\gamma} + T_{\circ} (\frac{T}{T_{\circ}})^\gamma\right)O(T∘​(T∘​Δ2KlogT​)1/γ+T∘​(T∘​T​)γ) for the iii-th player, where TTT is the learning horizon, KKK is the number of arms and Δ\DeltaΔ is an appropriately defined problem gap. Furthermore, we propose a blackboard communication based baseline achieving logarithmic regret in TTT.

View on arXiv
Comments on this paper