Clicktionary: A Web-based Game for Exploring the Atoms of Object Recognition

Understanding what visual features and representations contribute to human object recognition may provide scaffolding for more effective artificial vision systems. While recent advances in Deep Convolutional Networks (DCNs) have led to systems approaching human accuracy, it is unclear if they leverage the same visual features as humans for object recognition. We introduce Clicktionary, a competitive web-based game for discovering features that humans use for object recognition: One participant from a pair sequentially reveals parts of an object in an image until the other correctly identifies its category. Scoring image regions according to their proximity to correct recognition yields maps of visual feature importance for individual images. We find that these "realization" maps exhibit only weak correlation with relevance maps derived from DCNs or image salience algorithms. Cueing DCNs to attend to features emphasized by these maps improves their object recognition accuracy. Our results thus suggest that realization maps identify visual features that humans deem important for object recognition but are not adequately captured by DCNs. To rectify this shortcoming, we propose a novel web-based application for acquiring realization maps at scale, with the aim of improving the state-of-the-art in object recognition.
View on arXiv