An Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation

12 January 2017

Sadao Kurohashi

Abstract

In this paper, we compare two simple domain adaptation methods for neural machine translation (NMT): (1) We append an artificial token to the source sentences of two parallel corpora (different domains and one of them is resource scarce) to indicate the domain and then mix them to learn a multi domain NMT model; (2) We learn a NMT model on the resource rich domain corpus and then fine tune it using the resource poor domain corpus. We empirically verify fine tuning works better than the artificial token mechanism when the low resource domain corpus is of relatively poor quality (acquired via automatic extraction) but in the case of a high quality (manually created) low resource domain corpus both methods are equally viable.

View on arXiv

Comments on this paper