Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.12410
Cited By
Joint Optimization of Tokenization and Downstream Model
26 May 2021
Tatsuya Hiraoka
Sho Takase
Kei Uchiumi
Atsushi Keyaki
Naoaki Okazaki
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Joint Optimization of Tokenization and Downstream Model"
8 / 8 papers shown
Title
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing
Tatsuya Hiraoka
Tomoya Iwakura
20
0
0
21 Apr 2023
Elementwise Language Representation
Du-Yeong Kim
Jeeeun Kim
41
0
0
27 Feb 2023
Extending the Subwording Model of Multilingual Pretrained Models for New Languages
K. Imamura
Eiichiro Sumita
VLM
29
3
0
29 Nov 2022
Incorporating Context into Subword Vocabularies
Shaked Yehezkel
Yuval Pinter
47
8
0
13 Oct 2022
MaxMatch-Dropout: Subword Regularization for WordPiece
Tatsuya Hiraoka
54
8
0
09 Sep 2022
Impact of Tokenization on Language Models: An Analysis for Turkish
Cagri Toraman
E. Yilmaz
Furkan Şahinuç
Oguzhan Ozcelik
38
74
0
19 Apr 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
34
143
0
20 Dec 2021
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
51
153
0
23 Jun 2021
1