ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.08810
76
8
v1v2v3 (latest)

Algorithm Selection for Reinforcement Learning

30 January 2017
Romain Laroche
Raphael Feraud
    OffRL
ArXiv (abs)PDFHTML
Abstract

This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically validated on a dialogue task under 32 algorithm configurations. Another experiment on a fruit collection task shows that ESBAS can successfully be adapted to a true online setting where algorithms update their policies after each transition.

View on arXiv
Comments on this paper