ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17248
51
0
v1v2 (latest)

Backdoors in DRL: Four Environments Focusing on In-distribution Triggers

22 May 2025
C. Ashcraft
Ted Staley
Josh Carney
Cameron Hickert
Derek Juba
Kiran Karra
Nathan G. Drenkow
    AAML
ArXiv (abs)PDFHTML
Main:25 Pages
11 Figures
Bibliography:3 Pages
8 Tables
Appendix:5 Pages
Abstract

Backdoor attacks, or trojans, pose a security risk by concealing undesirable behavior in deep neural network models. Open-source neural networks are downloaded from the internet daily, possibly containing backdoors, and third-party model developers are common. To advance research on backdoor attack mitigation, we develop several trojans for deep reinforcement learning (DRL) agents. We focus on in-distribution triggers, which occur within the agent's natural data distribution, since they pose a more significant security threat than out-of-distribution triggers due to their ease of activation by the attacker during model deployment. We implement backdoor attacks in four reinforcement learning (RL) environments: LavaWorld, Randomized LavaWorld, Colorful Memory, and Modified Safety Gymnasium. We train various models, both clean and backdoored, to characterize these attacks. We find that in-distribution triggers can require additional effort to implement and be more challenging for models to learn, but are nevertheless viable threats in DRL even using basic data poisoning attacks.

View on arXiv
@article{ashcraft2025_2505.17248,
  title={ Backdoors in DRL: Four Environments Focusing on In-distribution Triggers },
  author={ Chace Ashcraft and Ted Staley and Josh Carney and Cameron Hickert and Kiran Karra and Nathan Drenkow },
  journal={arXiv preprint arXiv:2505.17248},
  year={ 2025 }
}
Comments on this paper