A New Architecture to Learn High-level Visual Features for Image Retrieval using Clickthrough Data

17 December 2013

Abstract

Image retrieval refers to finding relevant images from an image database for a query, which is not a difficult task for human if not considering the time cost. However, image retrieval is extremely challenging for machine as machine is poor in understanding images from image pixels even expert designed visual features. Recently further developed deep neural network trained on large-scale dataset has achieved great success in recognizing thousands objects and demonstrates its potential in learning better image representation. In this paper, we proposed a multi-task DNN named Ring Training for image retrieval to learn the high level visual features from clickthrough data. The multi-task DNN contains two parts, i.e., query-sharing layers for image representation computation and query-specific layers for relevance estimation. Experimental results on both simulated and real dataset show the effectiveness of the proposed approach.

View on arXiv

Comments on this paper