Distilling Word Embeddings: An Encoding Approach

International Conference on Information and Knowledge Management (CIKM), 2015

15 June 2015

Yan Xu

Abstract

Distilling knowledge from a well-trained cumbersome network to a small one has become a new research topic recently, as lightweight neural networks with high performance are particularly in need in various resource-restricted systems. This paper addresses the problem of distilling embeddings for NLP tasks. We propose an encoding approach to distill task-specific knowledge from high-dimensional embeddings, which can retain high performance and reduce model complexity to a large extent. Experimental results show our method is better than directly training neural networks with small embeddings.

View on arXiv

Comments on this paper