Model Selection for Generic Reinforcement Learning

We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel belongs to a family of models with finite metric entropy. In the model selection framework, instead of , we are given nested families of transition kernels . We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel lies. \texttt{ARL-GEN} uses the Upper Confidence Reinforcement Learning (\texttt{UCRL}) algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that \texttt{ARL-GEN} obtains a regret of , with high probability, where is the horizon length, is the total number of steps, is the Eluder dimension and is the metric entropy corresponding to . Note that this regret scaling matches that of an oracle that knows in advance. We show that the cost of model selection for \texttt{ARL-GEN} is an additive term in the regret having a weak dependence on . Subsequently, we remove the separability assumption and consider the setup of linear mixture MDPs, where the transition kernel has a linear function approximation. With this low rank structure, we propose novel adaptive algorithms for model selection, and obtain (order-wise) regret identical to that of an oracle with knowledge of the true model class.
View on arXiv