Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.14780
Cited By
Training and Evaluation of a Multilingual Tokenizer for GPT-SW3
28 April 2023
Felix Stollenwerk
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training and Evaluation of a Multilingual Tokenizer for GPT-SW3"
6 / 6 papers shown
Title
Efficiently Adapting Pretrained Language Models To New Languages
Zoltan Csaki
Pian Pawakapan
Urmish Thakker
Qiantong Xu
CLL
36
17
0
09 Nov 2023
Continual Learning Under Language Shift
Evangelia Gogoulou
Timothée Lesort
Magnus Boman
Joakim Nivre
KELM
CLL
32
3
0
02 Nov 2023
Core Building Blocks: Next Gen Geo Spatial GPT Application
Ashley Fernandez
Swaraj Dube
24
5
0
17 Oct 2023
Tokenizer Choice For LLM Training: Negligible or Crucial?
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Richard Rutmann
Max Lübbering
...
Malte Ostendorff
Samuel Weinbach
R. Sifa
Stefan Kesselheim
Nicolas Flores-Herr
23
47
0
12 Oct 2023
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
80
235
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
1