ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.03258
  4. Cited By
Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in
  Finetuning Pretrained Language Models

Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models

4 October 2024
Gunjan Balde
Soumyadeep Roy
Mainack Mondal
Niloy Ganguly
ArXiv (abs)PDFHTMLGithub (7★)

Papers citing "Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models"

7 / 7 papers shown
Title
AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain
AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain
Jimin Hong
Taehee Kim
Hyesu Lim
Jaegul Choo
38
25
0
26 Oct 2021
Efficient Domain Adaptation of Language Models via Adaptive Tokenization
Efficient Domain Adaptation of Language Models via Adaptive Tokenization
Vin Sachidananda
Jason S Kessler
Yi-An Lai
64
36
0
15 Sep 2021
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
143
255
0
31 Dec 2020
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
400
18,913
0
13 Feb 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
703
24,572
0
26 Jul 2019
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
214
3,534
0
19 Aug 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
918
6,799
0
26 Sep 2016
1