ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.11130
  4. Cited By
Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced
  Arabic Language Models

Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models

17 March 2024
M. Alrefaie
Nour Eldin Morsy
Nada Samir
ArXivPDFHTML

Papers citing "Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models"

17 / 17 papers shown
Title
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
V. Aribandi
Yi Tay
Tal Schuster
J. Rao
H. Zheng
...
Jianmo Ni
Jai Gupta
Kai Hui
Sebastian Ruder
Donald Metzler
MoE
83
215
0
22 Nov 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
108
159
0
23 Jun 2021
Evaluating Various Tokenizers for Arabic Text Classification
Evaluating Various Tokenizers for Arabic Text Classification
Zaid Alyafeai
Maged S. Al-Shaibani
Mustafa Ghaleb
Irfan Ahmad
60
43
0
14 Jun 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Linting Xue
Aditya Barua
Noah Constant
Rami Al-Rfou
Sharan Narang
Mihir Kale
Adam Roberts
Colin Raffel
93
504
0
28 May 2021
Joint Optimization of Tokenization and Downstream Model
Joint Optimization of Tokenization and Downstream Model
Tatsuya Hiraoka
Sho Takase
Kei Uchiumi
Atsushi Keyaki
Naoaki Okazaki
41
17
0
26 May 2021
A Survey on Transfer Learning in Natural Language Processing
A Survey on Transfer Learning in Natural Language Processing
Zaid Alyafeai
Maged S. Alshaibani
Irfan Ahmad
63
74
0
31 May 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom
Greg Durrett
61
210
0
07 Apr 2020
AraBERT: Transformer-based Model for Arabic Language Understanding
AraBERT: Transformer-based Model for Arabic Language Understanding
Wissam Antoun
Fady Baly
Hazem M. Hajj
105
969
0
28 Feb 2020
From English To Foreign Languages: Transferring Pre-trained Language
  Models
From English To Foreign Languages: Transferring Pre-trained Language Models
Ke M. Tran
42
51
0
18 Feb 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
621
24,431
0
26 Jul 2019
A Call for Prudent Choice of Subword Merge Operations in Neural Machine
  Translation
A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation
Shuoyang Ding
Adithya Renduchintala
Kevin Duh
43
64
0
24 May 2019
FRAGE: Frequency-Agnostic Word Representation
FRAGE: Frequency-Agnostic Word Representation
Chengyue Gong
Di He
Xu Tan
Tao Qin
Liwei Wang
Tie-Yan Liu
OOD
59
144
0
18 Sep 2018
Subword Regularization: Improving Neural Network Translation Models with
  Multiple Subword Candidates
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
Taku Kudo
223
1,167
0
29 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
894
6,788
0
26 Sep 2016
Improving Neural Machine Translation Models with Monolingual Data
Improving Neural Machine Translation Models with Monolingual Data
Rico Sennrich
Barry Haddow
Alexandra Birch
246
2,717
0
20 Nov 2015
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
215
7,735
0
31 Aug 2015
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
668
31,489
0
16 Jan 2013
1