Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.06262
Cited By
Morphology Matters: A Multilingual Language Modeling Analysis
11 December 2020
Hyunji Hayley Park
Katherine J. Zhang
Coleman Haley
K. Steimel
Han Liu
Lane Schwartz
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Morphology Matters: A Multilingual Language Modeling Analysis"
19 / 19 papers shown
Title
Limitations of Religious Data and the Importance of the Target Domain: Towards Machine Translation for Guinea-Bissau Creole
Jacqueline Rowe
Edward Gow-Smith
Mark Hepple
49
0
0
03 Apr 2025
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
Ehsaneddin Asgari
Yassine El Kheir
Mohammad Ali Sadraei Javaheri
58
0
0
02 Feb 2025
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5
Thao Anh Dang
Limor Raviv
Lukas Galke
25
1
0
15 Oct 2024
Recent advancements in computational morphology : A comprehensive survey
Jatayu Baxi
Brijesh S. Bhatt
AI4CE
37
1
0
08 Jun 2024
A Morphology-Based Investigation of Positional Encodings
Poulami Ghosh
Shikhar Vashishth
Raj Dabre
Pushpak Bhattacharyya
26
1
0
06 Apr 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
13
0
15 Mar 2024
Analyzing Cognitive Plausibility of Subword Tokenization
Lisa Beinborn
Yuval Pinter
29
17
0
20 Oct 2023
Tokenizer Choice For LLM Training: Negligible or Crucial?
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Richard Rutmann
Max Lübbering
...
Malte Ostendorff
Samuel Weinbach
R. Sifa
Stefan Kesselheim
Nicolas Flores-Herr
23
47
0
12 Oct 2023
Sentence Embedding Models for Ancient Greek Using Multilingual Knowledge Distillation
Kevin Krahn
D. Tate
Andrew C. Lamicela
22
4
0
24 Aug 2023
Tokenization with Factorized Subword Encoding
David Samuel
Lilja Øvrelid
38
1
0
13 Jun 2023
Effects of sub-word segmentation on performance of transformer language models
Jue Hou
Anisia Katinskaia
Anh Vu
R. Yangarber
13
4
0
09 May 2023
Average Is Not Enough: Caveats of Multilingual Evaluation
Matúš Pikuliak
Marian Simko
19
3
0
03 Jan 2023
Measuring Geographic Performance Disparities of Offensive Language Classifiers
Brandon Lwowski
P. Rad
Anthony Rios
42
5
0
15 Sep 2022
Morphological Processing of Low-Resource Languages: Where We Are and What's Next
Adam Wiemerslage
Miikka Silfverberg
Changbing Yang
Arya D. McCarthy
Garrett Nicolai
Eliana Colunga
Katharina Kann
28
12
0
16 Mar 2022
Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages
C.M. Downey
Shannon Drizin
Levon Haroutunian
Shivin Thukral
23
2
0
16 Oct 2021
You should evaluate your language model on marginal likelihood over tokenisations
Kris Cao
Laura Rimell
31
23
0
06 Sep 2021
Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction
Maria Ryskina
Eduard H. Hovy
Taylor Berg-Kirkpatrick
Matthew R. Gormley
24
2
0
24 Jun 2021
A multilabel approach to morphosyntactic probing
Naomi Tachikawa Shapiro
Amandalynne Paullada
Shane Steinert-Threlkeld
34
10
0
17 Apr 2021
A Masked Segmental Language Model for Unsupervised Natural Language Segmentation
C.M. Downey
Fei Xia
Gina-Anne Levow
Shane Steinert-Threlkeld
11
13
0
16 Apr 2021
1