ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08417
  4. Cited By
Reinforcement Learning with a Corrupted Reward Channel

Reinforcement Learning with a Corrupted Reward Channel

23 May 2017
Tom Everitt
Victoria Krakovna
Laurent Orseau
Marcus Hutter
Shane Legg
ArXivPDFHTML

Papers citing "Reinforcement Learning with a Corrupted Reward Channel"

50 / 64 papers shown
Title
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
Pedro M. P. Curvo
LLMAG
19
0
0
19 May 2025
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
X. Zhang
Rongxiang Weng
Zifei Cheng
Wenhao Zhuang
Zheng Lin
...
Shouyu Yin
Chaohang Wen
Haotian Zhang
Bin Chen
Bing Yu
LRM
43
6
0
19 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
45
5
0
12 Apr 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zhe Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
78
69
0
18 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
2
0
16 Mar 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
96
1
0
22 Jan 2025
Robot See, Robot Do: Imitation Reward for Noisy Financial Environments
Robot See, Robot Do: Imitation Reward for Noisy Financial Environments
Sven Goluža
Tomislav Kovačević
Stjepan Begušić
Z. Kostanjčar
34
0
0
13 Nov 2024
RL, but don't do anything I wouldn't do
RL, but don't do anything I wouldn't do
Michael K. Cohen
Marcus Hutter
Yoshua Bengio
Stuart J. Russell
OffRL
37
2
0
08 Oct 2024
Scaling Laws for Reward Model Overoptimization in Direct Alignment
  Algorithms
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Rafael Rafailov
Yaswanth Chittepu
Ryan Park
Harshit S. Sikchi
Joey Hejna
Bradley Knox
Chelsea Finn
S. Niekum
65
52
0
05 Jun 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
Enhancing Reinforcement Learning in Sensor Fusion: A Comparative
  Analysis of Cubature and Sampling-based Integration Methods for Rover Search
  Planning
Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning
Jan‐Hendrik Ewers
S. Swinton
David Anderson
E. McGookin
Douglas G. Thomson
14
2
0
14 May 2024
Reinforcement Learning from LLM Feedback to Counteract Goal
  Misgeneralization
Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization
Houda Nait El Barj
Théophile Sautory
37
2
0
14 Jan 2024
Beyond Expected Return: Accounting for Policy Reproducibility when
  Evaluating Reinforcement Learning Algorithms
Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning Algorithms
Manon Flageat
Bryan Lim
Antoine Cully
OffRL
25
3
0
12 Dec 2023
Exploring Parity Challenges in Reinforcement Learning through Curriculum
  Learning with Noisy Labels
Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels
Bei Zhou
Søren Riis
27
2
0
08 Dec 2023
Goodhart's Law in Reinforcement Learning
Goodhart's Law in Reinforcement Learning
Jacek Karwowski
Oliver Hayman
Xingjian Bai
Klaus Kiendlhofer
Charlie Griffin
Joar Skalse
34
9
0
13 Oct 2023
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning
  from Human Feedback
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Wei Shen
Rui Zheng
Wenyu Zhan
Jun Zhao
Shihan Dou
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
48
44
0
08 Oct 2023
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Weidong Liu
Jiyuan Tu
Yichen Zhang
Xi Chen
OffRL
24
2
0
04 Oct 2023
STARC: A General Framework For Quantifying Differences Between Reward
  Functions
STARC: A General Framework For Quantifying Differences Between Reward Functions
Joar Skalse
Lucy Farnik
S. Motwani
Erik Jenner
Adam Gleave
Alessandro Abate
27
9
0
26 Sep 2023
Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization
Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization
Uri Gadot
E. Derman
Navdeep Kumar
Maxence Mohamed Elfatihi
Kfir Y. Levy
Shie Mannor
38
5
0
03 Sep 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
50
945
0
31 May 2023
Simple Noisy Environment Augmentation for Reinforcement Learning
Simple Noisy Environment Augmentation for Reinforcement Learning
Raad Khraishi
Ramin Okhrati
OffRL
21
1
0
04 May 2023
Internally Rewarded Reinforcement Learning
Internally Rewarded Reinforcement Learning
Mengdi Li
Xufeng Zhao
Jae Hee Lee
C. Weber
S. Wermter
27
10
0
01 Feb 2023
Mutation Testing of Deep Reinforcement Learning Based on Real Faults
Mutation Testing of Deep Reinforcement Learning Based on Real Faults
Florian Tambon
Vahid Majdinasab
Amin Nikanjam
Foutse Khomh
G. Antoniol
41
7
0
13 Jan 2023
A Survey on Reinforcement Learning Security with Application to
  Autonomous Driving
A Survey on Reinforcement Learning Security with Application to Autonomous Driving
Ambra Demontis
Maura Pintor
Christian Scano
Kathrin Grosse
Hsiao-Ying Lin
Chengfang Fang
Battista Biggio
Fabio Roli
AAML
49
4
0
12 Dec 2022
Solving math word problems with process- and outcome-based feedback
Solving math word problems with process- and outcome-based feedback
J. Uesato
Nate Kushman
Ramana Kumar
Francis Song
Noah Y. Siegel
L. Wang
Antonia Creswell
G. Irving
I. Higgins
FaML
ReLM
AIMat
LRM
47
296
0
25 Nov 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
493
0
19 Oct 2022
Distributional Reward Estimation for Effective Multi-Agent Deep
  Reinforcement Learning
Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning
Jifeng Hu
Yanchao Sun
Hechang Chen
Sili Huang
Haiyin Piao
Yi-Ju Chang
Lichao Sun
28
5
0
14 Oct 2022
Defining and Characterizing Reward Hacking
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
59
56
0
27 Sep 2022
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities:
  Robustness, Safety, and Generalizability
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
Mengdi Xu
Zuxin Liu
Peide Huang
Wenhao Ding
Zhepeng Cen
Bo Li
Ding Zhao
79
45
0
16 Sep 2022
Wild Patterns Reloaded: A Survey of Machine Learning Security against
  Training Data Poisoning
Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning
Antonio Emanuele Cinà
Kathrin Grosse
Ambra Demontis
Sebastiano Vascon
Werner Zellinger
Bernhard A. Moser
Alina Oprea
Battista Biggio
Marcello Pelillo
Fabio Roli
AAML
27
119
0
04 May 2022
Reinforcement Learning for Personalized Drug Discovery and Design for
  Complex Diseases: A Systems Pharmacology Perspective
Reinforcement Learning for Personalized Drug Discovery and Design for Complex Diseases: A Systems Pharmacology Perspective
Ryan K. Tan
Yang Liu
Lei Xie
49
2
0
21 Jan 2022
The Effects of Reward Misspecification: Mapping and Mitigating
  Misaligned Models
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Alexander Pan
Kush S. Bhatia
Jacob Steinhardt
55
172
0
10 Jan 2022
MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced
  Active Learning
MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning
M. Peschl
Arkady Zgonnikov
F. Oliehoek
Luciano Cavalcante Siebert
17
26
0
30 Dec 2021
On the Expressivity of Markov Reward
On the Expressivity of Markov Reward
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Doina Precup
Satinder Singh
29
82
0
01 Nov 2021
A study of first-passage time minimization via Q-learning in heated
  gridworlds
A study of first-passage time minimization via Q-learning in heated gridworlds
M. A. Larchenko
Pavel Osinenko
Grigory Yaremenko
V. V. Palyulin
28
4
0
05 Oct 2021
Impossibility Results in AI: A Survey
Impossibility Results in AI: A Survey
Mario Brčič
Roman V. Yampolskiy
29
25
0
01 Sep 2021
No-Regret Reinforcement Learning with Heavy-Tailed Rewards
No-Regret Reinforcement Learning with Heavy-Tailed Rewards
Vincent Zhuang
Yanan Sui
219
11
0
25 Feb 2021
Stable deep reinforcement learning method by predicting uncertainty in
  rewards as a subtask
Stable deep reinforcement learning method by predicting uncertainty in rewards as a subtask
Kanata Suzuki
T. Ogata
14
2
0
18 Jan 2021
Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Avoiding Tampering Incentives in Deep RL via Decoupled Approval
J. Uesato
Ramana Kumar
Victoria Krakovna
Tom Everitt
Richard Ngo
Shane Legg
28
14
0
17 Nov 2020
REALab: An Embedded Perspective on Tampering
REALab: An Embedded Perspective on Tampering
Ramana Kumar
J. Uesato
Richard Ngo
Tom Everitt
Victoria Krakovna
Shane Legg
30
10
0
17 Nov 2020
Policy Learning Using Weak Supervision
Policy Learning Using Weak Supervision
Jingkang Wang
Hongyi Guo
Zhaowei Zhu
Yang Liu
OffRL
27
14
0
05 Oct 2020
Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent
  Policy Optimization
Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent Policy Optimization
Feng Tao
Yongcan Cao
11
2
0
21 Sep 2020
On Controllability of AI
On Controllability of AI
Roman V. Yampolskiy
21
14
0
19 Jul 2020
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial
  Imitation Learning
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning
Lionel Blondé
Pablo Strasser
Alexandros Kalousis
33
22
0
28 Jun 2020
Avoiding Side Effects in Complex Environments
Avoiding Side Effects in Complex Environments
Alexander Matt Turner
Neale Ratzlaff
Prasad Tadepalli
30
34
0
11 Jun 2020
Balance Between Efficient and Effective Learning: Dense2Sparse Reward
  Shaping for Robot Manipulation with Environment Uncertainty
Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty
Yongle Luo
Kun Dong
Lili Zhao
Zhiyong Sun
Chao Zhou
Bo Song
34
13
0
05 Mar 2020
Manipulating Reinforcement Learning: Poisoning Attacks on Cost Signals
Manipulating Reinforcement Learning: Poisoning Attacks on Cost Signals
Yunhan Huang
Quanyan Zhu
AAML
11
4
0
07 Feb 2020
Effects of sparse rewards of different magnitudes in the speed of
  learning of model-based actor critic methods
Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods
Juan Vargas
L. Andjelic
A. Farimani
25
1
0
18 Jan 2020
Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement
  Learning
Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning
Riashat Islam
Raihan Seraj
Samin Yeasar Arnob
Doina Precup
OffRL
9
3
0
11 Dec 2019
Detecting Spiky Corruption in Markov Decision Processes
Detecting Spiky Corruption in Markov Decision Processes
Jason V. Mancuso
Tomasz Kisielewski
David Lindner
Alok Singh
9
4
0
30 Jun 2019
12
Next