Fracking Deep Convolutional Image Descriptors

19 December 2014

E. Simo-Serra

Eduard Trulls

Luis Ferraz

Iasonas Kokkinos

Francesc Moreno-Noguer

ArXiv (abs)PDF HTML

Abstract

In this paper we propose a novel framework for learning local image descriptors in a discriminative manner. For this purpose we explore a siamese architecture of Deep Convolutional Neural Networks (CNN), with a Hinge embedding loss on the L2 distance between descriptors. Since a siamese architecture uses pairs rather than single image patches to train, there exist a large number of positive samples and an exponential number of negative samples. We propose to explore this space with a stochastic sampling approach of the training set, in combination with an aggressive mining strategy over both the positive and negative samples which we denote as "fracking". We perform a thorough evaluation of the architecture hyper-parameters, and demonstrate very large performance gains compared to both standard CNN learning strategies and hand-crafted image descriptors like SIFT, up to 2.5x in terms of the area under the Precision-Recall curve.

View on arXiv

Comments on this paper