v1v2 (latest)

Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

International Conference on Pattern Recognition (ICPR), 2020

28 December 2020

Mélodie Boillet

Christopher Kermorvant

Thierry Paquet

ArXiv (abs)PDF HTML

Abstract

In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.

View on arXiv

Comments on this paper