ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.15909
34
0

RL3^33: Boosting Meta Reinforcement Learning via RL inside RL2^22

28 June 2023
Abhinav Bhatia
Samer B. Nashed
S. Zilberstein
    OffRL
ArXivPDFHTML
Abstract

Meta reinforcement learning (meta-RL) methods such as RL2^22 have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, they show poor asymptotic performance and struggle with out-of-distribution tasks because they rely on sequence models, such as recurrent neural networks or transformers, to process experiences rather than summarize them using general-purpose RL components such as value functions. In contrast, traditional RL algorithms are data-inefficient as they do not use domain knowledge, but they do converge to an optimal policy in the limit. We propose RL3^33, a principled hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL. We show that RL3^33 earns greater cumulative reward in the long term, compared to RL2^22, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies.

View on arXiv
Comments on this paper