Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.16829
Cited By
Understanding and Mitigating Tokenization Bias in Language Models
24 June 2024
Buu Phan
Marton Havasi
Matthew Muckley
Karen Ullrich
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding and Mitigating Tokenization Bias in Language Models"
5 / 5 papers shown
Title
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
51
3
0
17 Mar 2025
Toward a Theory of Tokenization in LLMs
Nived Rajaraman
Jiantao Jiao
Kannan Ramchandran
LLMAG
29
5
0
12 Apr 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
51
22
0
10 Mar 2024
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
61
28
0
28 Feb 2024
Getting the most out of your tokenizer for pre-training and domain adaptation
Gautier Dagan
Gabriele Synnaeve
Baptiste Rozière
34
20
0
01 Feb 2024
1