54
1

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

Abstract

In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a O((fHi2+1)K13T23)O((\Vert f\Vert^2_{\mathcal{H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}}) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a O(U23K13(i=1KLT(fi))23)O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum^K_{i=1}L_T(f^\ast_i))^{\frac{2}{3}}) expected bound where LT(fi)L_T(f^\ast_i) is the cumulative losses of optimal hypothesis in Hi={fHi:fHiU}\mathbb{H}_{i}=\{f\in\mathcal{H}_i:\Vert f\Vert_{\mathcal{H}_i}\leq U\}. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a O(UKTln23T)O(U\sqrt{KT}\ln^{\frac{2}{3}}{T}) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous O(TlnK+fHi2max{T,TR})O(\sqrt{T\ln{K}} +\Vert f\Vert^2_{\mathcal{H}_i}\max\{\sqrt{T},\frac{T}{\sqrt{\mathcal{R}}}\}) expected bound where R\mathcal{R} is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.

View on arXiv
Comments on this paper