Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.09738
Cited By
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning
14 May 2025
Shaurya Sharthak
Vinayak Pahalwan
Adithya Kamath
Adarsh Shirawalmath
CLL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning"
7 / 7 papers shown
Title
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
111
7
0
17 Mar 2025
LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language
Cagri Toraman
VLM
91
5
0
13 May 2024
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
56
91
0
12 May 2023
WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
Benjamin Minixhofer
Fabian Paischer
Navid Rekabsaz
66
84
0
13 Dec 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
115
446
0
18 Apr 2021
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
VLM
AI4CE
CLL
152
2,423
0
23 Apr 2020
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
212
7,735
0
31 Aug 2015
1