Ultrafast image categorization in vivo and in silico
- OOD

Humans are able to categorize images very efficiently, in particular to detect very rapidly the presence of an animal. Recently, deep learning algorithms have achieved higher accuracy than humans for a large set of visual recognition tasks. However, the tasks on which these artificial networks are usually trained and evaluated are usually very specialized which do not generalize well, for example with an accuracy drop following a rotation of the image. In this regard, biological visual systems are more flexible and efficient than artificial systems for more generic tasks, such as detecting an animal. To further the comparison between biological and artificial neural networks, we retrained the standard VGG16 convolutional neural network (CNN) on two independent tasks that are ecologically relevant to humans: detecting the presence of an animal or an artifact. We show that retraining the network achieves a human-like level of performance, comparable to what is reported in psychophysical tasks. Moreover, we show that categorization is better when combining the models' outputs. Indeed, animals (e.g. lions) tend to be less present in photographs containing artifacts (e.g. buildings). Furthermore, these re-trained models were able to reproduce some unexpected behavioral observations of human psychophysics, such as robustness to rotations (e.g., an upside-down or tilted image) or to a grayscale transformation. Finally, we quantified the number of CNN layers needed to achieve such performance, showing that good accuracy for ultrafast image categorization could be achieved with only a few layers, challenging the belief that image recognition would require a deep sequential analysis of visual objects.
View on arXiv