ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.03608
30
44

UniMorph 4.0: Universal Morphology

7 May 2022
Khuyagbaatar Batsuren
Omer Goldman
Salam Khalifa
Nizar Habash
Christo Kirov
Gábor Bella
Brian Leonard
Garrett Nicolai
Kyle Gorman
Yustinus Ghanggo Ate
Maria Ryskina
Sabrina J. Mielke
Elena Budianskaya
Charbel El-Khaissi
Tiago Pimentel
John Sylak-Glassman
William Lane
Mohit Raj
Matt Coler
Jaime Rafael Montoya Samame
Delio Siticonatzi Camaiteri
Benoît Sagot
Géraldine Walther
Didier López Francis
Arturo Oncevay
Juan López Bautista
Gema Celeste Silva Villegas
Lucas Torroba Hennigen
Adam Ek
David Guriel
Peter Dirix
Jean-Philippe Bernardy
Andrey Scherbakov
Aziyana Bayyr-ool
Antonios Anastasopoulos
Roberto Zariquiey
Karina Sheifer
Sofya Ganieva
Hilaria Cruz
Ritván Karahóǧa
Stella Markantonatou
George Pavlidis
Matvey Plugaryov
Elena Klyachko
Ali Salehi
Manaal Faruqui
Jatayu Baxi
Andrew Krizhanovsky
Natalia Krizhanovskaya
Elizabeth Salesky
Clara Vania
Sardana Ivanova
Jennifer C. White
Rowan Hall Maudslay
Josef Valvoda
Ran Zmigrod
Paula Czarnowska
Irene Nikkarinen
Aelita Salchak
Brijesh S. Bhatt
Christopher Straughn
Zoey Liu
Jonathan North Washington
Yuval Pinter
Duygu Ataman
Marcin Wolinski
Totok Suhardijanto
Sandra Kübler
Niklas Stoehr
Hossep Dolatian
Zahroh Nuriah
Shyam Ratan
Jason Eisner
Edoardo M. Ponti
Grant Aiton
Aryaman Arora
Richard J. Hatcher
Ritesh Kumar
Jeremiah Young
Daria Rodionova
Anastasia Yemelina
Taras Andrushko
Igor Marchenko
Polina Mashkovtseva
Alexandra Serova
Emily Tucker Prudhommeaux
Maria Nepomniashchaya
Fausto Giunchiglia
Eleanor Chodroff
Mans Hulden
Miikka Silfverberg
Arya D. McCarthy
David Yarowsky
Ryan Cotterell
Reut Tsarfaty
Ekaterina Vylomova
ArXivPDFHTML
Abstract

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.

View on arXiv
Comments on this paper