ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.04168
77
0
v1v2 (latest)

Horizon Reduction Makes RL Scalable

4 June 2025
Seohong Park
Kevin Frans
Deepinder Mann
Benjamin Eysenbach
Aviral Kumar
Sergey Levine
    OffRL
ArXiv (abs)PDFHTML
Main:10 Pages
21 Figures
Bibliography:7 Pages
5 Tables
Appendix:17 Pages
Abstract

In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks, using datasets up to 1000x larger than typical offline RL datasets. We observe that despite scaling up data, many existing offline RL algorithms exhibit poor scaling behavior, saturating well below the maximum performance. We hypothesize that the horizon is the main cause behind the poor scaling of offline RL. We empirically verify this hypothesis through several analysis experiments, showing that long horizons indeed present a fundamental barrier to scaling up offline RL. We then show that various horizon reduction techniques substantially enhance scalability on challenging tasks. Based on our insights, we also introduce a minimal yet scalable method named SHARSA that effectively reduces the horizon. SHARSA achieves the best asymptotic performance and scaling behavior among our evaluation methods, showing that explicitly reducing the horizon unlocks the scalability of offline RL. Code: this https URL

View on arXiv
@article{park2025_2506.04168,
  title={ Horizon Reduction Makes RL Scalable },
  author={ Seohong Park and Kevin Frans and Deepinder Mann and Benjamin Eysenbach and Aviral Kumar and Sergey Levine },
  journal={arXiv preprint arXiv:2506.04168},
  year={ 2025 }
}
Comments on this paper