Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

End-to-end control for robot manipulation and grasping is emerging as an attractive alternative to traditional pipelined approaches. However, end-to-end methods tend to be either slow to train, exhibit little or no generalisability, or lack the ability to accomplish long-horizon or multi-stage tasks. In this paper, we show how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task that is analogous to a simple tidying routine, without having seen a single real image. This involves locating, reaching for, and grasping a cube, then locating a basket to drop the cube in. The first technique is to utilise the full state from a simulator to collect a series of control velocities which accomplish the task. The second technique is to utilise domain randomisation to allow the controller to generalise to the real world. Our results show that we are able to successfully accomplish the task in the real world with the ability to generalise to novel environments, including those with novel lighting conditions and distractor objects, and the ability to deal with moving objects, including the basket itself. We believe our approach to be simple, highly scalable and capable of learning long-horizon tasks that have so far not been shown with the state-of-the-art in end-to-end robot control.
View on arXiv