14
2

Text classification with sparse composite document vectors

Abstract

We present a modified feature formation technique - sparse document vector (SDV) derived from - graded-weighted Bag of Word Vectors (BoWV) by (Vivek Gupta, 2016) for faster and better composite document feature representation. We propose a very simple feature construction algorithm that potentially overcomes many weaknesses in current distributional vector representations and other composite document representation methods widely used for text representation. Through extensive experiments on multi-class classification on 20newsgroup dataset and multilabel text classification on Reuters-21578, we achieve better performance results and also the significant reduction in training and prediction time compared to composite document representation methods BoWV(Vivek Gupta, 2016) and TWE(Liu et al., 2015b). We also significantly perform better than current state of art method NTSG(Liu et al., 2015a) on the standard performance metric for 20NewsGroup.

View on arXiv
Comments on this paper