Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.04599
Cited By
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
6 September 2024
Pavel Chizhov
Catherine Arnett
Elizaveta Korotkova
Ivan P. Yamshchikov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training"
3 / 3 papers shown
Title
Toward a Theory of Tokenization in LLMs
Nived Rajaraman
Jiantao Jiao
Kannan Ramchandran
LLMAG
24
19
0
12 Apr 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
135
358
0
01 Feb 2024
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1