Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts

Machine reading comprehension (MRC) is a sub-field in natural language processing or computational linguistics. MRC aims to help computers understand unstructured texts and then answer questions related to them. In this paper, we present a new Vietnamese corpus for conversational machine reading comprehension (UIT-ViCoQA), consisting of 10,000 questions with answers over 2,000 conversations about health news articles. We analyze UIT-ViCoQA in depth with different linguistic aspects. Then, we evaluate several baseline models about dialogue and reading comprehension on the UIT-ViCoQA corpus. The best model obtains an F1 score of 45.27%, which is 30.91 points behind human performance (76.18%), indicating that there is ample room for improvement.
View on arXiv