Jointly Learning to Construct and Control Agents using Deep
Reinforcement Learning
The physical design of a robot and the policy that controls its motion are inherently coupled. However, existing approaches largely ignore this coupling, instead choosing to alternate between separate design and control phases, which requires expert intuition throughout and risks convergence to suboptimal designs. In this work, we propose a method that jointly optimizes over the physical design of a robot and the corresponding control policy in a model-free fashion, without any need for expert supervision. For an arbitrary robot morphology, our method maintains a distribution over design parameters and uses reinforcement learning to train a shared neural network controller for sampled designs. Throughout training, we use our policy network to quickly evaluate new designs and refine the robot distribution to maximize expected reward. This results in an assignment to the robot parameters and neural network policy that are jointly optimal. We evaluate our approach in the context of legged locomotion, and demonstrate that it discovers novel robot designs and walking gaits for several different morphologies, achieving performance better than a baseline and hand-crafted designs.
View on arXiv