Revisit Semantic Representation and Tree Search for Similar Question Retrieval

22 August 2019

Abstract

This paper studies the performances of BERT and tree-based structure in short sentence ranking task. We fine-tune BERT on the training data to get semantic vector or sentence embeddings on the test data. We use all the sentence embeddings to build our tree based on k-means and do beam search at predicting time when given a sentence as query. We do the experiments on the semantic textual similarity dataset, Quora Question Pairs, and process the dataset for sentence ranking. Experimental results show that our methods outperform the strong baseline. Our tree accelerate the predicting speed by 500%-1000% without losing too much ranking accuracy.

View on arXiv

Comments on this paper