Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.08417
Cited By
Reinforcement Learning with a Corrupted Reward Channel
23 May 2017
Tom Everitt
Victoria Krakovna
Laurent Orseau
Marcus Hutter
Shane Legg
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reinforcement Learning with a Corrupted Reward Channel"
32 / 32 papers shown
Title
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
Pedro M. P. Curvo
LLMAG
18
0
0
19 May 2025
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
X. Zhang
Rongxiang Weng
Zifei Cheng
Wenhao Zhuang
Zheng Lin
...
Shouyu Yin
Chaohang Wen
Haotian Zhang
Bin Chen
Bing Yu
LRM
43
6
0
19 Apr 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zhe Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya-Qin Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
78
69
0
18 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
2
0
16 Mar 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
96
1
0
22 Jan 2025
RL, but don't do anything I wouldn't do
Michael K. Cohen
Marcus Hutter
Yoshua Bengio
Stuart J. Russell
OffRL
35
2
0
08 Oct 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Weidong Liu
Jiyuan Tu
Yichen Zhang
Xi Chen
OffRL
24
2
0
04 Oct 2023
Mutation Testing of Deep Reinforcement Learning Based on Real Faults
Florian Tambon
Vahid Majdinasab
Amin Nikanjam
Foutse Khomh
G. Antoniol
36
7
0
13 Jan 2023
A Survey on Reinforcement Learning Security with Application to Autonomous Driving
Ambra Demontis
Maura Pintor
Christian Scano
Kathrin Grosse
Hsiao-Ying Lin
Chengfang Fang
Battista Biggio
Fabio Roli
AAML
49
4
0
12 Dec 2022
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
493
0
19 Oct 2022
Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning
Jifeng Hu
Yanchao Sun
Hechang Chen
Sili Huang
Haiyin Piao
Yi-Ju Chang
Lichao Sun
25
5
0
14 Oct 2022
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
59
56
0
27 Sep 2022
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
Mengdi Xu
Zuxin Liu
Peide Huang
Wenhao Ding
Zhepeng Cen
Bo-wen Li
Ding Zhao
79
45
0
16 Sep 2022
Reinforcement Learning for Personalized Drug Discovery and Design for Complex Diseases: A Systems Pharmacology Perspective
Ryan K. Tan
Yang Liu
Lei Xie
49
2
0
21 Jan 2022
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Alexander Pan
Kush S. Bhatia
Jacob Steinhardt
53
172
0
10 Jan 2022
On the Expressivity of Markov Reward
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Doina Precup
Satinder Singh
29
82
0
01 Nov 2021
A study of first-passage time minimization via Q-learning in heated gridworlds
M. A. Larchenko
Pavel Osinenko
Grigory Yaremenko
V. V. Palyulin
26
4
0
05 Oct 2021
Impossibility Results in AI: A Survey
Mario Brčič
Roman V. Yampolskiy
29
25
0
01 Sep 2021
Avoiding Tampering Incentives in Deep RL via Decoupled Approval
J. Uesato
Ramana Kumar
Victoria Krakovna
Tom Everitt
Richard Ngo
Shane Legg
26
14
0
17 Nov 2020
REALab: An Embedded Perspective on Tampering
Ramana Kumar
J. Uesato
Richard Ngo
Tom Everitt
Victoria Krakovna
Shane Legg
30
10
0
17 Nov 2020
Avoiding Side Effects in Complex Environments
Alexander Matt Turner
Neale Ratzlaff
Prasad Tadepalli
30
34
0
11 Jun 2020
Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty
Yongle Luo
Kun Dong
Lili Zhao
Zhiyong Sun
Chao Zhou
Bo Song
34
13
0
05 Mar 2020
Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals
Yunhan Huang
Quanyan Zhu
OffRL
AAML
46
84
0
24 Jun 2019
Advantage Amplification in Slowly Evolving Latent-State Environments
Martin Mladenov
Ofer Meshi
Jayden Ooi
Dale Schuurmans
Craig Boutilier
OffRL
26
9
0
29 May 2019
Conservative Agency via Attainable Utility Preservation
Alexander Matt Turner
Dylan Hadfield-Menell
Prasad Tadepalli
30
49
0
26 Feb 2019
Embedded Agency
A. Demski
Scott Garrabrant
AIFin
35
34
0
25 Feb 2019
Human-Centered Artificial Intelligence and Machine Learning
Mark O. Riedl
SyDa
35
261
0
31 Jan 2019
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
34
397
0
19 Nov 2018
Safe Option-Critic: Learning Safety in the Option-Critic Architecture
Arushi Jain
Khimya Khetarpal
Doina Precup
21
26
0
21 Jul 2018
Reward Estimation for Variance Reduction in Deep Reinforcement Learning
Joshua Romoff
Peter Henderson
Alexandre Piché
Vincent François-Lavet
Joelle Pineau
8
42
0
09 May 2018
The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches
Md. Zahangir Alom
T. Taha
C. Yakopcic
Stefan Westberg
P. Sidike
Mst Shamima Nasrin
B. Van Essen
A. Awwal
V. Asari
VLM
29
875
0
03 Mar 2018
1