ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.09883
  4. Cited By
AI Safety Gridworlds

AI Safety Gridworlds

27 November 2017
Jan Leike
Miljan Martic
Victoria Krakovna
Pedro A. Ortega
Tom Everitt
Andrew Lefrancq
Laurent Orseau
Shane Legg
ArXivPDFHTML

Papers citing "AI Safety Gridworlds"

50 / 144 papers shown
Title
On the Connection Between Diffusion Models and Molecular Dynamics
On the Connection Between Diffusion Models and Molecular Dynamics
Liam Harcombe
Timothy T. Duignan
DiffM
59
0
0
04 Apr 2025
HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents
Tristan Tomilin
Meng Fang
Mykola Pechenizkiy
60
0
0
11 Mar 2025
Safety Representations for Safer Policy Learning
Safety Representations for Safer Policy Learning
Kaustubh Mani
Vincent Mai
Charlie Gauthier
Annie Chen
Samer Nashed
Liam Paull
45
0
0
27 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
Xinsong Zhang
Jingyu Wang
Wenbing Tao
66
4
0
17 Feb 2025
Adaptive Language-Guided Abstraction from Contrastive Explanations
Adaptive Language-Guided Abstraction from Contrastive Explanations
Andi Peng
Belinda Z. Li
Ilia Sucholutsky
Nishanth Kumar
Julie A. Shah
Jacob Andreas
Andreea Bobu
OffRL
38
1
0
12 Sep 2024
Emergence in Multi-Agent Systems: A Safety Perspective
Emergence in Multi-Agent Systems: A Safety Perspective
Philipp Altmann
Julian Schonberger
Steffen Illium
Maximilian Zorn
Fabian Ritz
Tom Haider
Simon Burton
Thomas Gabor
40
1
0
08 Aug 2024
Evaluating AI Evaluation: Perils and Prospects
Evaluating AI Evaluation: Perils and Prospects
John Burden
ELM
43
8
0
12 Jul 2024
Reducing Human-Robot Goal State Divergence with Environment Design
Reducing Human-Robot Goal State Divergence with Environment Design
Kelsey Sikes
Sarah Keren
S. Sreedharan
16
1
0
10 Apr 2024
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw
Shivam Singhal
Anca Dragan
AAML
32
11
0
05 Mar 2024
Reinforcement Learning with Ensemble Model Predictive Safety
  Certification
Reinforcement Learning with Ensemble Model Predictive Safety Certification
Sven Gronauer
Tom Haider
Felippe Schmoeller da Roza
Klaus Diepold
28
3
0
06 Feb 2024
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
  Constitution
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution
Wenyue Hua
Xianjun Yang
Zelong Li
Cheng Wei
Yongfeng Zhang
LLMAG
40
13
0
02 Feb 2024
Improving Reinforcement Learning from Human Feedback with Efficient
  Reward Model Ensemble
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
Shun Zhang
Zhenfang Chen
Sunli Chen
Yikang Shen
Zhiqing Sun
Chuang Gan
31
26
0
30 Jan 2024
Concrete Problems in AI Safety, Revisited
Concrete Problems in AI Safety, Revisited
Inioluwa Deborah Raji
Roel Dobbe
14
13
0
18 Dec 2023
CERN for AI: A Theoretical Framework for Autonomous Simulation-Based Artificial Intelligence Testing and Alignment
CERN for AI: A Theoretical Framework for Autonomous Simulation-Based Artificial Intelligence Testing and Alignment
Ljubiša Bojić
Matteo Cinelli
D. Ćulibrk
Boris Delibasic
25
4
0
14 Dec 2023
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Jiaming Ji
Borong Zhang
Jiayi Zhou
Xuehai Pan
Weidong Huang
Ruiyang Sun
Yiran Geng
Yifan Zhong
Juntao Dai
Yaodong Yang
OffRL
36
63
0
19 Oct 2023
Conceptual Framework for Autonomous Cognitive Entities
Conceptual Framework for Autonomous Cognitive Entities
David Shapiro
Wangfan Li
Manuel Delaflor
Carlos Toxtli
44
1
0
03 Oct 2023
CoinRun: Solving Goal Misgeneralisation
CoinRun: Solving Goal Misgeneralisation
Stuart Armstrong
Alexandre Maranhao
Oliver Daniels-Koch
Ioannis Gkioulekas
Rebecca Gormann
LRM
35
0
0
28 Sep 2023
Provably Efficient Exploration in Constrained Reinforcement
  Learning:Posterior Sampling Is All You Need
Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need
Danil Provodin
Pratik Gajane
Mykola Pechenizkiy
M. Kaptein
39
0
0
27 Sep 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
24
177
0
26 Sep 2023
Evaluating the Vulnerabilities in ML systems in terms of adversarial
  attacks
Evaluating the Vulnerabilities in ML systems in terms of adversarial attacks
John Harshith
Mantej Singh Gill
Madhan Jothimani
AAML
20
1
0
24 Aug 2023
Designing Fiduciary Artificial Intelligence
Designing Fiduciary Artificial Intelligence
Sebastian Benthall
David Shekman
51
4
0
27 Jul 2023
Probabilistic Constraint for Safety-Critical Reinforcement Learning
Probabilistic Constraint for Safety-Critical Reinforcement Learning
Weiqin Chen
D. Subramanian
Santiago Paternain
34
15
0
29 Jun 2023
Survival Instinct in Offline Reinforcement Learning
Survival Instinct in Offline Reinforcement Learning
Anqi Li
Dipendra Kumar Misra
Andrey Kolobov
Ching-An Cheng
OffRL
37
16
0
05 Jun 2023
The Chai Platform's AI Safety Framework
The Chai Platform's AI Safety Framework
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
26
2
0
05 Jun 2023
Survey of Trustworthy AI: A Meta Decision of AI
Survey of Trustworthy AI: A Meta Decision of AI
Caesar Wu
Yuan-Fang Li
Pascal Bouvry
24
3
0
01 Jun 2023
Human Control: Definitions and Algorithms
Human Control: Definitions and Algorithms
Ryan Carey
Tom Everitt
30
6
0
31 May 2023
Training Socially Aligned Language Models on Simulated Social
  Interactions
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
37
46
0
26 May 2023
CROP: Towards Distributional-Shift Robust Reinforcement Learning using
  Compact Reshaped Observation Processing
CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing
Philipp Altmann
Fabian Ritz
Leonard Feuchtinger
Jonas Nusslein
Claudia Linnhoff-Popien
Thomy Phan
OOD
OffRL
27
5
0
26 Apr 2023
Both eyes open: Vigilant Incentives help Regulatory Markets improve AI
  Safety
Both eyes open: Vigilant Incentives help Regulatory Markets improve AI Safety
Paolo Bova
A. D. Stefano
H. Anh
31
4
0
06 Mar 2023
Solving Richly Constrained Reinforcement Learning through State
  Augmentation and Reward Penalties
Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties
Hao Jiang
Tien Mai
Pradeep Varakantham
M. Hoang
OffRL
12
2
0
27 Jan 2023
LMPriors: Pre-Trained Language Models as Task-Specific Priors
LMPriors: Pre-Trained Language Models as Task-Specific Priors
Kristy Choi
Chris Cundy
Sanjari Srivastava
Stefano Ermon
BDL
58
38
0
22 Oct 2022
Near-Optimal Multi-Agent Learning for Safe Coverage Control
Near-Optimal Multi-Agent Learning for Safe Coverage Control
Manish Prajapat
M. Turchetta
Melanie Zeilinger
Andreas Krause
35
14
0
12 Oct 2022
Policy Gradients for Probabilistic Constrained Reinforcement Learning
Policy Gradients for Probabilistic Constrained Reinforcement Learning
Weiqin Chen
D. Subramanian
Santiago Paternain
29
6
0
02 Oct 2022
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities:
  Robustness, Safety, and Generalizability
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
Mengdi Xu
Zuxin Liu
Peide Huang
Wenhao Ding
Zhepeng Cen
Bo-wen Li
Ding Zhao
79
45
0
16 Sep 2022
An Empirical Evaluation of Posterior Sampling for Constrained
  Reinforcement Learning
An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning
Danil Provodin
Pratik Gajane
Mykola Pechenizkiy
M. Kaptein
33
1
0
08 Sep 2022
Categorical semantics of compositional reinforcement learning
Categorical semantics of compositional reinforcement learning
Georgios Bakirtzis
M. Savvas
Ufuk Topcu
CoGe
48
4
0
29 Aug 2022
Boolean Decision Rules for Reinforcement Learning Policy Summarisation
Boolean Decision Rules for Reinforcement Learning Policy Summarisation
J. McCarthy
Rahul Nair
Elizabeth M. Daly
Radu Marinescu
Ivana Dusparic
6
1
0
18 Jul 2022
Formalizing the Problem of Side Effect Regularization
Formalizing the Problem of Side Effect Regularization
Alexander Matt Turner
Aseem Saxena
Prasad Tadepalli
27
2
0
23 Jun 2022
Aligning to Social Norms and Values in Interactive Narratives
Aligning to Social Norms and Values in Interactive Narratives
Prithviraj Ammanabrolu
Liwei Jiang
Maarten Sap
Hannaneh Hajishirzi
Yejin Choi
AI4CE
28
47
0
04 May 2022
Graph Neural Network based Agent in Google Research Football
Graph Neural Network based Agent in Google Research Football
Yizhan Niu
Jinglong Liu
Yuhao Shi
Jiren Zhu
GNN
27
2
0
23 Apr 2022
A Brief Guide to Designing and Evaluating Human-Centered Interactive
  Machine Learning
A Brief Guide to Designing and Evaluating Human-Centered Interactive Machine Learning
K. Mathewson
P. Pilarski
HAI
21
4
0
20 Apr 2022
Dynamic Certification for Autonomous Systems
Dynamic Certification for Autonomous Systems
Georgios Bakirtzis
Steven Carr
David Danks
Ufuk Topcu
11
10
0
21 Mar 2022
Detecting danger in gridworlds using Gromov's Link Condition
Detecting danger in gridworlds using Gromov's Link Condition
Thomas F Burns
R. Tang
AI4CE
31
2
0
17 Jan 2022
Learning to Minimize Cost-to-Serve for Multi-Node Multi-Product Order
  Fulfilment in Electronic Commerce
Learning to Minimize Cost-to-Serve for Multi-Node Multi-Product Order Fulfilment in Electronic Commerce
Pranavi Pathakota
Kunwar Zaid
Anulekha Dhara
Hardik Meisheri
Shaun C. D'Souza
Dheeraj Shah
H. Khadilkar
16
4
0
16 Dec 2021
Scalar reward is not enough: A response to Silver, Singh, Precup and
  Sutton (2021)
Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)
Peter Vamplew
Benjamin J. Smith
Johan Källström
G. Ramos
Roxana Rădulescu
...
Fredrik Heintz
Patrick Mannion
Pieter J. K. Libin
Richard Dazeley
Cameron Foale
LRM
29
66
0
25 Nov 2021
Model-Free Risk-Sensitive Reinforcement Learning
Model-Free Risk-Sensitive Reinforcement Learning
Grégoire Delétang
Jordi Grau-Moya
M. Kunesch
Tim Genewein
Rob Brekelmans
Shane Legg
Pedro A. Ortega
OOD
10
9
0
04 Nov 2021
Learning to Be Cautious
Learning to Be Cautious
Montaser Mohammedalamen
Dustin Morrill
Alexander Sieusahai
Yash Satsangi
Michael Bowling
18
3
0
29 Oct 2021
Generalized Out-of-Distribution Detection: A Survey
Generalized Out-of-Distribution Detection: A Survey
Jingkang Yang
Kaiyang Zhou
Yixuan Li
Ziwei Liu
193
881
0
21 Oct 2021
safe-control-gym: a Unified Benchmark Suite for Safe Learning-based
  Control and Reinforcement Learning in Robotics
safe-control-gym: a Unified Benchmark Suite for Safe Learning-based Control and Reinforcement Learning in Robotics
Zhaocong Yuan
Adam W. Hall
Siqi Zhou
Lukas Brunke
Melissa Greeff
Jacopo Panerati
Angela P. Schoellig
OffRL
104
53
0
13 Sep 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
36
12
0
12 Sep 2021
123
Next