Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents

Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents

11 January 2024

Quentin Delfosse

Sebastian Sztwiertnia

Wolfgang Stammer

Kristian Kersting

Papers citing "Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents"

13 / 13 papers shown

Title
Interpretable end-to-end Neurosymbolic Reinforcement Learning agents Nils Grandien Quentin Delfosse Kristian Kersting OffRL 27 2 0 18 Oct 2024
BlendRL: A Framework for Merging Symbolic and Neural Policy Learning Hikaru Shindo Quentin Delfosse D. Dhami Kristian Kersting 43 3 0 15 Oct 2024
Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations Yupei Yang Biwei Huang Fan Feng Xinyue Wang Shikui Tu Lei Xu CML OOD TTA 38 1 0 30 Jul 2024
Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning Hector Kohler Quentin Delfosse R. Akrour Kristian Kersting Philippe Preux 62 14 0 23 May 2024
Boosting Object Representation Learning via Motion and Object Continuity Quentin Delfosse Wolfgang Stammer Thomas Rothenbacher Dwarak Vittal Kristian Kersting OCL 37 20 0 16 Nov 2022
Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences L. Guan Karthik Valmeekam Subbarao Kambhampati 51 8 0 28 Oct 2022
Neural Networks are Decision Trees Çağlar Aytekin FAtt 32 24 0 11 Oct 2022
GlanceNets: Interpretabile, Leak-proof Concept-based Models Emanuele Marconato Andrea Passerini Stefano Teso 106 64 0 31 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 358 8,495 0 28 Jan 2022
Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations Wolfgang Stammer Marius Memmel P. Schramowski Kristian Kersting 91 26 0 04 Dec 2021
Adaptive Rational Activations to Boost Deep Reinforcement Learning Quentin Delfosse P. Schramowski Martin Mundt Alejandro Molina Kristian Kersting 37 14 0 18 Feb 2021
AI safety via debate G. Irving Paul Christiano Dario Amodei 204 199 0 02 May 2018
You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon S. Divvala Ross B. Girshick Ali Farhadi ObjD 289 36,320 0 08 Jun 2015