Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.05772
Cited By
Wine is Not v i n. -- On the Compatibility of Tokenizations Across Languages
13 September 2021
Antonis Maronikolakis
Philipp Dufter
Hinrich Schütze
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Wine is Not v i n. -- On the Compatibility of Tokenizations Across Languages"
9 / 9 papers shown
Title
A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Francois Meyer
Jan Buys
39
1
0
29 Mar 2024
Analyzing Cognitive Plausibility of Subword Tokenization
Lisa Beinborn
Yuval Pinter
29
17
0
20 Oct 2023
Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models
Phillip Rust
Anders Søgaard
33
3
0
17 Aug 2023
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models
Jimin Sun
Patrick Fernandes
Xinyi Wang
Graham Neubig
40
9
0
13 Oct 2022
Incorporating Context into Subword Vocabularies
Shaked Yehezkel
Yuval Pinter
47
8
0
13 Oct 2022
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
38
46
0
14 Jul 2022
Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages
Vaidehi Patil
Partha P. Talukdar
Sunita Sarawagi
24
21
0
03 Mar 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
34
143
0
20 Dec 2021
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
80
235
0
31 Dec 2020
1