ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.08417
18
1

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

10 June 2025
Qingmao Yao
Zhichao Lei
Tianyuan Chen
Ziyue Yuan
Xuefan Chen
Jianxiang Liu
Faguo Wu
Xiao Zhang
    OffRL
ArXiv (abs)PDFHTML
Abstract

Offline Reinforcement Learning (RL) struggles with distributional shifts, leading to the QQQ-value overestimation for out-of-distribution (OOD) actions. Existing methods address this issue by imposing constraints; however, they often become overly conservative when evaluating OOD regions, which constrains the QQQ-function generalization. This over-constraint issue results in poor QQQ-value estimation and hinders policy improvement. In this paper, we introduce a novel approach to achieve better QQQ-value estimation by enhancing QQQ-function generalization in OOD regions within Convex Hull and its Neighborhood (CHN). Under the safety generalization guarantees of the CHN, we propose the Smooth Bellman Operator (SBO), which updates OOD QQQ-values by smoothing them with neighboring in-sample QQQ-values. We theoretically show that SBO approximates true QQQ-values for both in-sample and OOD actions within the CHN. Our practical algorithm, Smooth Q-function OOD Generalization (SQOG), empirically alleviates the over-constraint issue, achieving near-accurate QQQ-value estimation. On the D4RL benchmarks, SQOG outperforms existing state-of-the-art methods in both performance and computational efficiency.

View on arXiv
@article{yao2025_2506.08417,
  title={ Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood },
  author={ Qingmao Yao and Zhichao Lei and Tianyuan Chen and Ziyue Yuan and Xuefan Chen and Jianxiang Liu and Faguo Wu and Xiao Zhang },
  journal={arXiv preprint arXiv:2506.08417},
  year={ 2025 }
}
Comments on this paper