Neural-encoding Human Experts' Domain Knowledge to Warm Start
Reinforcement Learning
- OffRL
Deep reinforcement learning has seen great success across a breadth of tasks, such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth of readily available knowledge from domain experts that could help "warm start" the learning process. Further, learning from demonstration techniques are not yet efficient enough to infer this knowledge through sampling-based mechanisms in large state and action spaces. We present a new reinforcement learning architecture that can encode expert knowledge, in the form of propositional logic, directly into a neural, tree-like structure of fuzzy propositions amenable to gradient descent and show that our novel architecture is able to outperform reinforcement and imitation learning techniques across an array of reinforcement learning challenges. We further conduct a user study to solicit expert policies from a variety of humans and find that humans are able to specify policies that provide a higher quality reward both before and after training relative to baseline methods, demonstrating the utility of our approach.
View on arXiv