Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.04058
Cited By
Improving Tokenisation by Alternative Treatment of Spaces
8 April 2022
Edward Gow-Smith
Harish Tayyar Madabushi
Carolina Scarton
Aline Villavicencio
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Tokenisation by Alternative Treatment of Spaces"
5 / 5 papers shown
Title
Tokenization is Sensitive to Language Variation
Anna Wegmann
Dong Nguyen
David Jurgens
84
1
0
24 Feb 2025
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
29
2
0
30 Mar 2024
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
38
46
0
14 Jul 2022
Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning
Stig-Arne Gronroos
Sami Virpioja
M. Kurimo
VLM
26
21
0
06 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1