ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.09572
  4. Cited By
How do different tokenizers perform on downstream tasks in scriptio
  continua languages?: A case study in Japanese

How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese

16 June 2023
T. Fujii
Koki Shibata
Atsuki Yamaguchi
Terufumi Morishita
Yasuhiro Sogawa
ArXivPDFHTML

Papers citing "How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese"

8 / 8 papers shown
Title
Overcoming Vocabulary Constraints with Pixel-level Fallback
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz
Hendra Setiawan
Stephan Peitz
Yova Kementchedjhieva
43
0
0
02 Apr 2025
Efficient Continual Pre-training of LLMs for Low-resource Languages
Efficient Continual Pre-training of LLMs for Low-resource Languages
Arijit Nag
Soumen Chakrabarti
Animesh Mukherjee
Niloy Ganguly
82
0
0
13 Dec 2024
Responsible Multilingual Large Language Models: A Survey of Development,
  Applications, and Societal Impact
Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact
Junhua Liu
Bin Fu
LRM
31
1
0
23 Oct 2024
Efficacy of ByT5 in Multilingual Translation of Biblical Texts for
  Underrepresented Languages
Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages
Corinne Aars
Lauren Adams
Xiaokan Tian
Zhaoyu Wang
Colton Wismer
Jason Wu
Pablo Rivas
Korn Sooksatra
Matthew Fendt
17
0
0
22 May 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and
  Frontiers
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Hai-Tao Zheng
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
55
36
0
07 Apr 2024
How Important Is Tokenization in French Medical Masked Language Models?
How Important Is Tokenization in French Medical Masked Language Models?
Yanis Labrak
Adrien Bazoge
B. Daille
Mickael Rouvier
Richard Dufour
41
1
0
22 Feb 2024
An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient
  Language Model Inference
An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference
Atsuki Yamaguchi
Aline Villavicencio
Nikolaos Aletras
27
7
0
16 Feb 2024
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
80
235
0
31 Dec 2020
1