SelQA: A New Benchmark for Selection-based Question Answering

Abstract
This paper presents a new dataset to benchmark selection-based question answering. Our dataset contains contexts drawn from the ten most prevalent topics in the English Wikipedia. For the generation of a large, diverse, and challenging dataset, a new annotation scheme is proposed. Our annotation scheme involves a series of crowdsourcing tasks that can be easily followed by any researcher. Several systems are compared on the tasks of answer sentence selection and answer triggering, providing strong baseline results for future work to improve upon. We hope that providing a large corpus will enable researchers to work towards more effective open-domain question answering.
View on arXivComments on this paper