Stroke-Based Scene Text Erasing Using Synthetic Data for Training

23 April 2021

Zhengmi Tang

Abstract

Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Either subtask requires considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text removal dataset does not allow existing methods to work according to their potential. To avoid the limitation of the lack of pairwise real-world data, we enhance and make considerable use of the synthetic text and subsequently train our model only on the dataset generated by the improved synthetic text engine. Our proposed network contains a stroke mask prediction module and background inpainting module that can extract the text stroke as a relatively small hole from the cropped text image to maintain more background content for better inpainting results. This model can partially erase text instances in a scene image with a bounding box or work with an existing scene-text detector for automatic scene text erasing. The experimental results from the qualitative and quantitative evaluation of the SCUT-Syn, ICDAR2013, and SCUT-EnsText datasets demonstrate that our method significantly outperforms existing state-of-the-art methods even when they were trained on real-world data.

View on arXiv

Comments on this paper