216
v1v2 (latest)

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

Abstract

In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a O((fHi2+1)K13T23)O((\Vert f\Vert^2_{\mathcal{H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}}) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a O(U23K13(i=1KLT(fi))23)O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum^K_{i=1}L_T(f^\ast_i))^{\frac{2}{3}}) expected bound where LT(fi)L_T(f^\ast_i) is the cumulative losses of optimal hypothesis in Hi={fHi:fHiU}\mathbb{H}_{i}=\{f\in\mathcal{H}_i:\Vert f\Vert_{\mathcal{H}_i}\leq U\}. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a O(UKTln23T)O(U\sqrt{KT}\ln^{\frac{2}{3}}{T}) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous O(TlnK+fHi2max{T,TR})O(\sqrt{T\ln{K}} +\Vert f\Vert^2_{\mathcal{H}_i}\max\{\sqrt{T},\frac{T}{\sqrt{\mathcal{R}}}\}) expected bound where R\mathcal{R} is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.

View on arXiv
Comments on this paper