Synthetic Data and Artificial Neural Networks for Natural Scene Text
Recognition
In this work we present a framework for the recognition of natural scene text. We use purely data-driven, deep learning models to perform word recognition on the whole image at the same time, departing from the character based recognition systems of the past. These models are trained solely on data produced by a synthetic text generation engine -- synthetic data that is highly realistic and sufficient to replace real data, giving us infinite amounts of training data. This excess of data exposes new possibilities for word recognition models, and here we introduce three novel models, each one "reading" words in a complementary way: via large-scale dictionary encoding, character sequence encoding, and bag-of-N-gram encoding. In the scenarios of language/lexicon based and completely unconstrained text recognition we demonstrate state-of-the-art performance on standard datasets, using our fast, simple machinery and requiring zero data-acquisition costs.
View on arXiv