This paper aims to reduce the time to annotate images for panoptic segmentation, which requires annotating segmentation masks and class labels for all object instances and stuff regions. We formulate our approach as a collaborative process between an annotator and an automated assistant who take turns to jointly annotate an image using a predefined pool of segments. Actions performed by the annotator serve as a strong contextual signal. The assistant intelligently reacts to this signal by annotating other parts of the image on its own, which reduces the amount of work required by the annotator. We perform thorough experiments on the COCO panoptic dataset, both in simulation and with human annotators. These demonstrate that our approach is 5x faster than manual polygon drawing tools, and is also significantly faster than the recent machine-assisted interface of [Andriluka ACMMM 2018]. Furthermore, we show on ADE20k that our method can be used to efficiently annotate new datasets, bootstrapping from a very small amount of annotated data.
View on arXiv