Joint Optimization of Tokenization and Downstream Model

26 May 2021

Papers citing "Joint Optimization of Tokenization and Downstream Model"

8 / 8 papers shown

Title
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing Tatsuya Hiraoka Tomoya Iwakura 20 0 0 21 Apr 2023
Elementwise Language Representation Du-Yeong Kim Jeeeun Kim 41 0 0 27 Feb 2023
Extending the Subwording Model of Multilingual Pretrained Models for New Languages K. Imamura Eiichiro Sumita VLM 29 3 0 29 Nov 2022
Incorporating Context into Subword Vocabularies Shaked Yehezkel Yuval Pinter 47 8 0 13 Oct 2022
MaxMatch-Dropout: Subword Regularization for WordPiece Tatsuya Hiraoka 54 8 0 09 Sep 2022
Impact of Tokenization on Language Models: An Analysis for Turkish Cagri Toraman E. Yilmaz Furkan Şahinuç Oguzhan Ozcelik 38 74 0 19 Apr 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP Sabrina J. Mielke Zaid Alyafeai Elizabeth Salesky Colin Raffel Manan Dey ... Arun Raja Chenglei Si Wilson Y. Lee Benoît Sagot Samson Tan 34 143 0 20 Dec 2021
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization Yi Tay Vinh Q. Tran Sebastian Ruder Jai Gupta Hyung Won Chung Dara Bahri Zhen Qin Simon Baumgartner Cong Yu Donald Metzler 51 153 0 23 Jun 2021