A learning perspective on the emergence of abstractions: the curious case of phonemes

In the present paper we use a range of modeling techniques to investigate whether an abstract phone could emerge from exposure to speech sounds. In effect, the study represents an attempt for operationalize a theoretical device of Usage-based Linguistics of emergence of an abstraction from language use. Our quest focuses on the simplest of such hypothesized abstractions. We test two opposing principles regarding the development of language knowledge in linguistically untrained language users: Memory-Based Learning (MBL) and Error-Correction Learning (ECL). A process of generalization underlies the abstractions linguists operate with, and we probed whether MBL and ECL could give rise to a type of language knowledge that resembles linguistic abstractions. Each model was presented with a significant amount of pre-processed speech produced by one speaker. We assessed the consistency or stability of what these simple models have learned and their ability to give rise to abstract categories. Both types of models fare differently with regard to these tests. We show that ECL models can learn abstractions and that at least part of the phone inventory and grouping into traditional types can be reliably identified from the input.
View on arXiv