Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14214
Cited By
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models
23 May 2023
Benjamin Minixhofer
Jonas Pfeiffer
Ivan Vulić
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models"
6 / 6 papers shown
Title
Unsupervised Morphological Tree Tokenizer
Qingyang Zhu
Xiang Hu
Pengyu Ji
Wei Wu
Kewei Tu
31
0
0
21 Jun 2024
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
27
9
0
20 Apr 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
38
14
0
02 Mar 2024
Effects of sub-word segmentation on performance of transformer language models
Jue Hou
Anisia Katinskaia
Anh Vu
R. Yangarber
13
4
0
09 May 2023
(Un)solving Morphological Inflection: Lemma Overlap Artificially Inflates Models' Performance
Omer Goldman
David Guriel
Reut Tsarfaty
92
28
0
12 Aug 2021
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
80
235
0
31 Dec 2020
1