Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.07111
Cited By
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models
13 October 2022
Jimin Sun
Patrick Fernandes
Xinyi Wang
Graham Neubig
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models"
3 / 3 papers shown
Title
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
13
0
15 Mar 2024
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Jungo Kasai
David R. Mortensen
Noah A. Smith
Yulia Tsvetkov
53
82
0
23 May 2023
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
80
235
0
31 Dec 2020
1