16
17

Learning to Bid in Contextual First Price Auctions

Abstract

In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time tt, the learner observes a context xtRdx_t\in \mathbb{R}^d and decides the bid based on historical information and xtx_t. We assume a structured linear model of the maximum bid of all the others mt=α0xt+ztm_t = \alpha_0\cdot x_t + z_t, where α0Rd\alpha_0\in \mathbb{R}^d is unknown to the learner and ztz_t is randomly sampled from a noise distribution F\mathcal{F} with log-concave density function ff. We consider both \emph{binary feedback} (the learner can only observe whether she wins or not) and \emph{full information feedback} (the learner can observe mtm_t) at the end of each time tt. For binary feedback, when the noise distribution F\mathcal{F} is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most O~(log(d)T)\widetilde{O}(\sqrt{\log(d) T}) regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with \emph{unknown} noise distribution, we provide an algorithm that achieves regret at most O~(dT)\widetilde{O}(\sqrt{dT}). Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution F\mathcal{F} and linear weight α0\alpha_0 simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least Ω(T)\Omega(\sqrt{T}), even when the learner receives the full information feedback and F\mathcal{F} is known.

View on arXiv
Comments on this paper