Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop
O. Scharenborg
Laurent Besacier
A. Black
M. Hasegawa-Johnson
Florian Metze
Graham Neubig
Sebastian Stüker
Pierre Godard
Markus Müller
Lucas Ondel
Shruti Palaskar
Philip Arthur
Francesco Ciannella
Mingxing Du
Elin Larsen
Danny Merkx
Rachid Riad
Liming Wang
Emmanuel Dupoux

Abstract
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.
View on arXivComments on this paper