ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08053
19
28

On the Power of Multitask Representation Learning in Linear MDP

15 June 2021
Rui Lu
Gao Huang
S. Du
ArXivPDFHTML
Abstract

While multitask representation learning has become a popular approach in reinforcement learning (RL), theoretical understanding of why and when it works remains limited. This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP) under a generative model. In this paper, we consider an agent to learn a representation function ϕ\phiϕ out of a function class Φ\PhiΦ from TTT source tasks with NNN data per task, and then use the learned ϕ^\hat{\phi}ϕ^​ to reduce the required number of sample for a new task. We first discover a \emph{Least-Activated-Feature-Abundance} (LAFA) criterion, denoted as κ\kappaκ, with which we prove that a straightforward least-square algorithm learns a policy which is O~(H2C(Φ)2κdNT+κdn)\tilde{O}(H^2\sqrt{\frac{\mathcal{C}(\Phi)^2 \kappa d}{NT}+\frac{\kappa d}{n}})O~(H2NTC(Φ)2κd​+nκd​​) sub-optimal. Here HHH is the planning horizon, C(Φ)\mathcal{C}(\Phi)C(Φ) is Φ\PhiΦ's complexity measure, ddd is the dimension of the representation (usually d≪C(Φ)d\ll \mathcal{C}(\Phi)d≪C(Φ)) and nnn is the number of samples for the new task. Thus the required nnn is O(κdH4)O(\kappa d H^4)O(κdH4) for the sub-optimality to be close to zero, which is much smaller than O(C(Φ)2κdH4)O(\mathcal{C}(\Phi)^2\kappa d H^4)O(C(Φ)2κdH4) in the setting without multitask representation learning, whose sub-optimality gap is O~(H2κC(Φ)2dn)\tilde{O}(H^2\sqrt{\frac{\kappa \mathcal{C}(\Phi)^2d}{n}})O~(H2nκC(Φ)2d​​). This theoretically explains the power of multitask representation learning in reducing sample complexity. Further, we note that to ensure high sample efficiency, the LAFA criterion κ\kappaκ should be small. In fact, κ\kappaκ varies widely in magnitude depending on the different sampling distribution for new task. This indicates adaptive sampling technique is important to make κ\kappaκ solely depend on ddd. Finally, we provide empirical results of a noisy grid-world environment to corroborate our theoretical findings.

View on arXiv
Comments on this paper