Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.08052
Cited By
Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language
14 March 2021
Bonaventure F. P. Dossou
Chris C. Emezue
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language"
4 / 4 papers shown
Title
Impact of Tokenization on Language Models: An Analysis for Turkish
Cagri Toraman
E. Yilmaz
Furkan Şahinuç
Oguzhan Ozcelik
38
74
0
19 Apr 2022
MMTAfrica: Multilingual Machine Translation for African Languages
Chris C. Emezue
Bonaventure F. P. Dossou
27
24
0
08 Apr 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
32
143
0
20 Dec 2021
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal
Cynthia Gao
Vishrav Chaudhary
Peng-Jen Chen
Guillaume Wenzek
Da Ju
Sanjan Krishnan
MarcÁurelio Ranzato
Francisco Guzman
Angela Fan
15
559
0
06 Jun 2021
1