Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models

4 October 2024

Papers citing "Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models"

7 / 7 papers shown

Title
AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain Jimin Hong Taehee Kim Hyesu Lim Jaegul Choo 38 25 0 26 Oct 2021
Efficient Domain Adaptation of Language Models via Adaptive Tokenization Vin Sachidananda Jason S Kessler Yi-An Lai 64 36 0 15 Sep 2021
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models Phillip Rust Jonas Pfeiffer Ivan Vulić Sebastian Ruder Iryna Gurevych 143 255 0 31 Dec 2020
A Simple Framework for Contrastive Learning of Visual Representations Ting-Li Chen Simon Kornblith Mohammad Norouzi Geoffrey E. Hinton SSL 400 18,913 0 13 Feb 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 703 24,572 0 26 Jul 2019
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing Taku Kudo John Richardson 214 3,534 0 19 Aug 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhiwen Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 918 6,799 0 26 Sep 2016