AI Safety Gridworlds

27 November 2017

Papers citing "AI Safety Gridworlds"

50 / 144 papers shown

Title
On the Connection Between Diffusion Models and Molecular Dynamics Liam Harcombe Timothy T. Duignan DiffM 59 0 0 04 Apr 2025
HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents Tristan Tomilin Meng Fang Mykola Pechenizkiy 60 0 0 11 Mar 2025
Safety Representations for Safer Policy Learning Kaustubh Mani Vincent Mai Charlie Gauthier Annie Chen Samer Nashed Liam Paull 45 0 0 27 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs En Yu Kangheng Lin Liang Zhao Yana Wei Zining Zhu ... Jianjian Sun Zheng Ge Xinsong Zhang Jingyu Wang Wenbing Tao 66 4 0 17 Feb 2025
Adaptive Language-Guided Abstraction from Contrastive Explanations Andi Peng Belinda Z. Li Ilia Sucholutsky Nishanth Kumar Julie A. Shah Jacob Andreas Andreea Bobu OffRL 38 1 0 12 Sep 2024
Emergence in Multi-Agent Systems: A Safety Perspective Philipp Altmann Julian Schonberger Steffen Illium Maximilian Zorn Fabian Ritz Tom Haider Simon Burton Thomas Gabor 40 1 0 08 Aug 2024
Evaluating AI Evaluation: Perils and Prospects John Burden ELM 43 8 0 12 Jul 2024
Reducing Human-Robot Goal State Divergence with Environment Design Kelsey Sikes Sarah Keren S. Sreedharan 16 1 0 10 Apr 2024
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking Cassidy Laidlaw Shivam Singhal Anca Dragan AAML 32 11 0 05 Mar 2024
Reinforcement Learning with Ensemble Model Predictive Safety Certification Sven Gronauer Tom Haider Felippe Schmoeller da Roza Klaus Diepold 28 3 0 06 Feb 2024
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution Wenyue Hua Xianjun Yang Zelong Li Cheng Wei Yongfeng Zhang LLMAG 40 13 0 02 Feb 2024
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble Shun Zhang Zhenfang Chen Sunli Chen Yikang Shen Zhiqing Sun Chuang Gan 31 26 0 30 Jan 2024
Concrete Problems in AI Safety, Revisited Inioluwa Deborah Raji Roel Dobbe 14 13 0 18 Dec 2023
CERN for AI: A Theoretical Framework for Autonomous Simulation-Based Artificial Intelligence Testing and Alignment Ljubiša Bojić Matteo Cinelli D. Ćulibrk Boris Delibasic 25 4 0 14 Dec 2023
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark Jiaming Ji Borong Zhang Jiayi Zhou Xuehai Pan Weidong Huang Ruiyang Sun Yiran Geng Yifan Zhong Juntao Dai Yaodong Yang OffRL 36 63 0 19 Oct 2023
Conceptual Framework for Autonomous Cognitive Entities David Shapiro Wangfan Li Manuel Delaflor Carlos Toxtli 44 1 0 03 Oct 2023
CoinRun: Solving Goal Misgeneralisation Stuart Armstrong Alexandre Maranhao Oliver Daniels-Koch Ioannis Gkioulekas Rebecca Gormann LRM 35 0 0 28 Sep 2023
Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need Danil Provodin Pratik Gajane Mykola Pechenizkiy M. Kaptein 39 0 0 27 Sep 2023
Large Language Model Alignment: A Survey Tianhao Shen Renren Jin Yufei Huang Chuang Liu Weilong Dong Zishan Guo Xinwei Wu Yan Liu Deyi Xiong LM&MA 24 177 0 26 Sep 2023
Evaluating the Vulnerabilities in ML systems in terms of adversarial attacks John Harshith Mantej Singh Gill Madhan Jothimani AAML 20 1 0 24 Aug 2023
Designing Fiduciary Artificial Intelligence Sebastian Benthall David Shekman 51 4 0 27 Jul 2023
Probabilistic Constraint for Safety-Critical Reinforcement Learning Weiqin Chen D. Subramanian Santiago Paternain 34 15 0 29 Jun 2023
Survival Instinct in Offline Reinforcement Learning Anqi Li Dipendra Kumar Misra Andrey Kolobov Ching-An Cheng OffRL 37 16 0 05 Jun 2023
The Chai Platform's AI Safety Framework Xiaoding Lu Aleksey Korshuk Z. Liu W. Beauchamp 26 2 0 05 Jun 2023
Survey of Trustworthy AI: A Meta Decision of AI Caesar Wu Yuan-Fang Li Pascal Bouvry 24 3 0 01 Jun 2023
Human Control: Definitions and Algorithms Ryan Carey Tom Everitt 30 6 0 31 May 2023
Training Socially Aligned Language Models on Simulated Social Interactions Ruibo Liu Ruixin Yang Chenyan Jia Ge Zhang Denny Zhou Andrew M. Dai Diyi Yang Soroush Vosoughi ALM 37 46 0 26 May 2023
CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing Philipp Altmann Fabian Ritz Leonard Feuchtinger Jonas Nusslein Claudia Linnhoff-Popien Thomy Phan OOD OffRL 27 5 0 26 Apr 2023
Both eyes open: Vigilant Incentives help Regulatory Markets improve AI Safety Paolo Bova A. D. Stefano H. Anh 31 4 0 06 Mar 2023
Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties Hao Jiang Tien Mai Pradeep Varakantham M. Hoang OffRL 12 2 0 27 Jan 2023
LMPriors: Pre-Trained Language Models as Task-Specific Priors Kristy Choi Chris Cundy Sanjari Srivastava Stefano Ermon BDL 58 38 0 22 Oct 2022
Near-Optimal Multi-Agent Learning for Safe Coverage Control Manish Prajapat M. Turchetta Melanie Zeilinger Andreas Krause 35 14 0 12 Oct 2022
Policy Gradients for Probabilistic Constrained Reinforcement Learning Weiqin Chen D. Subramanian Santiago Paternain 29 6 0 02 Oct 2022
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability Mengdi Xu Zuxin Liu Peide Huang Wenhao Ding Zhepeng Cen Bo-wen Li Ding Zhao 79 45 0 16 Sep 2022
An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning Danil Provodin Pratik Gajane Mykola Pechenizkiy M. Kaptein 33 1 0 08 Sep 2022
Categorical semantics of compositional reinforcement learning Georgios Bakirtzis M. Savvas Ufuk Topcu CoGe 48 4 0 29 Aug 2022
Boolean Decision Rules for Reinforcement Learning Policy Summarisation J. McCarthy Rahul Nair Elizabeth M. Daly Radu Marinescu Ivana Dusparic 6 1 0 18 Jul 2022
Formalizing the Problem of Side Effect Regularization Alexander Matt Turner Aseem Saxena Prasad Tadepalli 27 2 0 23 Jun 2022
Aligning to Social Norms and Values in Interactive Narratives Prithviraj Ammanabrolu Liwei Jiang Maarten Sap Hannaneh Hajishirzi Yejin Choi AI4CE 28 47 0 04 May 2022
Graph Neural Network based Agent in Google Research Football Yizhan Niu Jinglong Liu Yuhao Shi Jiren Zhu GNN 27 2 0 23 Apr 2022
A Brief Guide to Designing and Evaluating Human-Centered Interactive Machine Learning K. Mathewson P. Pilarski HAI 21 4 0 20 Apr 2022
Dynamic Certification for Autonomous Systems Georgios Bakirtzis Steven Carr David Danks Ufuk Topcu 11 10 0 21 Mar 2022
Detecting danger in gridworlds using Gromov's Link Condition Thomas F Burns R. Tang AI4CE 31 2 0 17 Jan 2022
Learning to Minimize Cost-to-Serve for Multi-Node Multi-Product Order Fulfilment in Electronic Commerce Pranavi Pathakota Kunwar Zaid Anulekha Dhara Hardik Meisheri Shaun C. D'Souza Dheeraj Shah H. Khadilkar 16 4 0 16 Dec 2021
Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021) Peter Vamplew Benjamin J. Smith Johan Källström G. Ramos Roxana Rădulescu ... Fredrik Heintz Patrick Mannion Pieter J. K. Libin Richard Dazeley Cameron Foale LRM 29 66 0 25 Nov 2021
Model-Free Risk-Sensitive Reinforcement Learning Grégoire Delétang Jordi Grau-Moya M. Kunesch Tim Genewein Rob Brekelmans Shane Legg Pedro A. Ortega OOD 10 9 0 04 Nov 2021
Learning to Be Cautious Montaser Mohammedalamen Dustin Morrill Alexander Sieusahai Yash Satsangi Michael Bowling 18 3 0 29 Oct 2021
Generalized Out-of-Distribution Detection: A Survey Jingkang Yang Kaiyang Zhou Yixuan Li Ziwei Liu 193 881 0 21 Oct 2021
safe-control-gym: a Unified Benchmark Suite for Safe Learning-based Control and Reinforcement Learning in Robotics Zhaocong Yuan Adam W. Hall Siqi Zhou Lukas Brunke Melissa Greeff Jacopo Panerati Angela P. Schoellig OffRL 104 53 0 13 Sep 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations Mridul Agarwal Qinbo Bai Vaneet Aggarwal 36 12 0 12 Sep 2021