ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.05047
25
22

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

18 July 2016
S. Murphy
Yanzhen Deng
Eric B. Laber
H. Maei
R. Sutton
K. Witkiewitz
    OffRL
ArXivPDFHTML
Abstract

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

View on arXiv
Comments on this paper