ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.12391
51
0

Reward-Safety Balance in Offline Safe RL via Diffusion Regularization

18 February 2025
Junyu Guo
Zhi Zheng
Donghao Ying
Ming Jin
Shangding Gu
C. Spanos
Javad Lavaei
    OffRL
ArXivPDFHTML
Abstract

Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent has only a fixed dataset -- common in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference. We further apply gradient manipulation for safety adaptation, balancing the reward objective and constraint satisfaction. This approach leverages high-quality offline data while incorporating safety requirements. Empirical results show that DRCORL achieves reliable safety performance, fast inference, and strong reward outcomes across robot learning tasks. Compared to existing safe offline RL methods, it consistently meets cost limits and performs well with the same hyperparameters, indicating practical applicability in real-world scenarios.

View on arXiv
@article{guo2025_2502.12391,
  title={ Reward-Safety Balance in Offline Safe RL via Diffusion Regularization },
  author={ Junyu Guo and Zhi Zheng and Donghao Ying and Ming Jin and Shangding Gu and Costas Spanos and Javad Lavaei },
  journal={arXiv preprint arXiv:2502.12391},
  year={ 2025 }
}
Comments on this paper