ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08417
  4. Cited By
Reinforcement Learning with a Corrupted Reward Channel

Reinforcement Learning with a Corrupted Reward Channel

23 May 2017
Tom Everitt
Victoria Krakovna
Laurent Orseau
Marcus Hutter
Shane Legg
ArXivPDFHTML

Papers citing "Reinforcement Learning with a Corrupted Reward Channel"

32 / 32 papers shown
Title
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
Pedro M. P. Curvo
LLMAG
18
0
0
19 May 2025
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
X. Zhang
Rongxiang Weng
Zifei Cheng
Wenhao Zhuang
Zheng Lin
...
Shouyu Yin
Chaohang Wen
Haotian Zhang
Bin Chen
Bing Yu
LRM
43
6
0
19 Apr 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zhe Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya-Qin Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
78
69
0
18 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
2
0
16 Mar 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
96
1
0
22 Jan 2025
RL, but don't do anything I wouldn't do
RL, but don't do anything I wouldn't do
Michael K. Cohen
Marcus Hutter
Yoshua Bengio
Stuart J. Russell
OffRL
35
2
0
08 Oct 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Weidong Liu
Jiyuan Tu
Yichen Zhang
Xi Chen
OffRL
24
2
0
04 Oct 2023
Mutation Testing of Deep Reinforcement Learning Based on Real Faults
Mutation Testing of Deep Reinforcement Learning Based on Real Faults
Florian Tambon
Vahid Majdinasab
Amin Nikanjam
Foutse Khomh
G. Antoniol
36
7
0
13 Jan 2023
A Survey on Reinforcement Learning Security with Application to
  Autonomous Driving
A Survey on Reinforcement Learning Security with Application to Autonomous Driving
Ambra Demontis
Maura Pintor
Christian Scano
Kathrin Grosse
Hsiao-Ying Lin
Chengfang Fang
Battista Biggio
Fabio Roli
AAML
49
4
0
12 Dec 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
493
0
19 Oct 2022
Distributional Reward Estimation for Effective Multi-Agent Deep
  Reinforcement Learning
Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning
Jifeng Hu
Yanchao Sun
Hechang Chen
Sili Huang
Haiyin Piao
Yi-Ju Chang
Lichao Sun
25
5
0
14 Oct 2022
Defining and Characterizing Reward Hacking
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
59
56
0
27 Sep 2022
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities:
  Robustness, Safety, and Generalizability
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
Mengdi Xu
Zuxin Liu
Peide Huang
Wenhao Ding
Zhepeng Cen
Bo-wen Li
Ding Zhao
79
45
0
16 Sep 2022
Reinforcement Learning for Personalized Drug Discovery and Design for
  Complex Diseases: A Systems Pharmacology Perspective
Reinforcement Learning for Personalized Drug Discovery and Design for Complex Diseases: A Systems Pharmacology Perspective
Ryan K. Tan
Yang Liu
Lei Xie
49
2
0
21 Jan 2022
The Effects of Reward Misspecification: Mapping and Mitigating
  Misaligned Models
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Alexander Pan
Kush S. Bhatia
Jacob Steinhardt
53
172
0
10 Jan 2022
On the Expressivity of Markov Reward
On the Expressivity of Markov Reward
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Doina Precup
Satinder Singh
29
82
0
01 Nov 2021
A study of first-passage time minimization via Q-learning in heated
  gridworlds
A study of first-passage time minimization via Q-learning in heated gridworlds
M. A. Larchenko
Pavel Osinenko
Grigory Yaremenko
V. V. Palyulin
26
4
0
05 Oct 2021
Impossibility Results in AI: A Survey
Impossibility Results in AI: A Survey
Mario Brčič
Roman V. Yampolskiy
29
25
0
01 Sep 2021
Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Avoiding Tampering Incentives in Deep RL via Decoupled Approval
J. Uesato
Ramana Kumar
Victoria Krakovna
Tom Everitt
Richard Ngo
Shane Legg
26
14
0
17 Nov 2020
REALab: An Embedded Perspective on Tampering
REALab: An Embedded Perspective on Tampering
Ramana Kumar
J. Uesato
Richard Ngo
Tom Everitt
Victoria Krakovna
Shane Legg
30
10
0
17 Nov 2020
Avoiding Side Effects in Complex Environments
Avoiding Side Effects in Complex Environments
Alexander Matt Turner
Neale Ratzlaff
Prasad Tadepalli
30
34
0
11 Jun 2020
Balance Between Efficient and Effective Learning: Dense2Sparse Reward
  Shaping for Robot Manipulation with Environment Uncertainty
Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty
Yongle Luo
Kun Dong
Lili Zhao
Zhiyong Sun
Chao Zhou
Bo Song
34
13
0
05 Mar 2020
Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost
  Signals
Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals
Yunhan Huang
Quanyan Zhu
OffRL
AAML
46
84
0
24 Jun 2019
Advantage Amplification in Slowly Evolving Latent-State Environments
Advantage Amplification in Slowly Evolving Latent-State Environments
Martin Mladenov
Ofer Meshi
Jayden Ooi
Dale Schuurmans
Craig Boutilier
OffRL
26
9
0
29 May 2019
Conservative Agency via Attainable Utility Preservation
Conservative Agency via Attainable Utility Preservation
Alexander Matt Turner
Dylan Hadfield-Menell
Prasad Tadepalli
30
49
0
26 Feb 2019
Embedded Agency
Embedded Agency
A. Demski
Scott Garrabrant
AIFin
35
34
0
25 Feb 2019
Human-Centered Artificial Intelligence and Machine Learning
Human-Centered Artificial Intelligence and Machine Learning
Mark O. Riedl
SyDa
35
261
0
31 Jan 2019
Scalable agent alignment via reward modeling: a research direction
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
34
397
0
19 Nov 2018
Safe Option-Critic: Learning Safety in the Option-Critic Architecture
Safe Option-Critic: Learning Safety in the Option-Critic Architecture
Arushi Jain
Khimya Khetarpal
Doina Precup
21
26
0
21 Jul 2018
Reward Estimation for Variance Reduction in Deep Reinforcement Learning
Reward Estimation for Variance Reduction in Deep Reinforcement Learning
Joshua Romoff
Peter Henderson
Alexandre Piché
Vincent François-Lavet
Joelle Pineau
8
42
0
09 May 2018
The History Began from AlexNet: A Comprehensive Survey on Deep Learning
  Approaches
The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches
Md. Zahangir Alom
T. Taha
C. Yakopcic
Stefan Westberg
P. Sidike
Mst Shamima Nasrin
B. Van Essen
A. Awwal
V. Asari
VLM
29
875
0
03 Mar 2018
1