ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.09846
25
0
v1v2v3 (latest)

SIBRE: Self Improvement Based REwards for Reinforcement Learning

21 April 2020
Somjit Nath
Richa Verma
Abhik Ray
H. Khadilkar
ArXiv (abs)PDFHTML
Abstract

We propose a generic reward shaping approach for improving rate of convergence in reinforcement learning (RL), called \textbf{S}elf \textbf{I}mprovement \textbf{B}ased \textbf{RE}wards, or \textbf{SIBRE}. The approach can be used for episodic environments in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance. We show that SIBRE converges under the same conditions as the algorithm whose reward has been modified. The new rewards help discriminate between policies when the original rewards are either weakly discriminated or sparse. Experiments show that in certain environments, this approach speeds up learning and converges to the optimal policy faster. We analyse SIBRE theoretically, and follow it up with tests on several well-known benchmark environments for reinforcement learning.

View on arXiv
Comments on this paper