Two-sample test based on Self-Organizing Maps

17 December 2022

A. Álvarez-Ayllón

M. Palomo-duarte

J. Dodero

ArXiv (abs)PDF HTML Github

Main:20 Pages

19 Figures

Bibliography:4 Pages

3 Tables

Appendix:3 Pages

Abstract

Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reject that both samples originate from the same population, but they do not offer insight into how both samples differ. Self-Organizing Maps are a dimensionality reduction initially devised as a data visualization tool that displays emergent properties, being also useful for classification tasks. Since they can be used as classifiers, they can be used also as a two-sample statistical test. But since their original purpose is visualization, they can also offer insights.

View on arXiv

Comments on this paper