Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.03258
Cited By
Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models
4 October 2024
Gunjan Balde
Soumyadeep Roy
Mainack Mondal
Niloy Ganguly
Re-assign community
ArXiv (abs)
PDF
HTML
Github (7★)
Papers citing
"Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models"
7 / 7 papers shown
Title
AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain
Jimin Hong
Taehee Kim
Hyesu Lim
Jaegul Choo
38
25
0
26 Oct 2021
Efficient Domain Adaptation of Language Models via Adaptive Tokenization
Vin Sachidananda
Jason S Kessler
Yi-An Lai
64
36
0
15 Sep 2021
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
143
255
0
31 Dec 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
400
18,913
0
13 Feb 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
703
24,572
0
26 Jul 2019
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
214
3,534
0
19 Aug 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
918
6,799
0
26 Sep 2016
1