Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.15524
Cited By
Fast WordPiece Tokenization
31 December 2020
Xinying Song
Alexandru Salcianu
Yang Song
Dave Dopson
Denny Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fast WordPiece Tokenization"
19 / 19 papers shown
Title
Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss
Zhuoang Cai
Zehan Li
Yi Liu
Liyuan Guo
Yangqiu Song
36
0
0
24 Apr 2025
Annotative Indexing
Charles L. A. Clarke
7
0
0
09 Nov 2024
MotionGlot: A Multi-Embodied Motion Generation Model
Sudarshan Harithas
Srinath Sridhar
82
1
0
22 Oct 2024
SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization
Kohei Tsuji
Tatsuya Hiraoka
Yuchang Cheng
Tomoya Iwakura
47
1
0
10 Sep 2024
Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences
Patrick Haller
Lena S. Bolliger
Lena Ann Jäger
42
1
0
07 Jun 2024
Revisiting character-level adversarial attacks
Elias Abad Rocamora
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
V. Cevher
AAML
39
3
0
07 May 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
62
0
0
16 Apr 2024
A Survey of Source Code Representations for Machine Learning-Based Cybersecurity Tasks
B.K. Casey
Joanna C. S. Santos
George Perry
63
5
0
15 Mar 2024
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLM
OCL
60
7
0
22 Feb 2024
Leveraging Domain Adaptation and Data Augmentation to Improve Quránic IR in English and Arabic
Vera Pavlova
31
2
0
05 Dec 2023
On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model
Nohil Park
Joonsuk Park
Kang Min Yoo
Sungroh Yoon
36
3
0
14 Nov 2023
DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew
Shaltiel Shmidman
Avi Shmidman
Moshe Koppel
30
7
0
31 Aug 2023
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Jungo Kasai
David R. Mortensen
Noah A. Smith
Yulia Tsvetkov
53
82
0
23 May 2023
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing
Tatsuya Hiraoka
Tomoya Iwakura
20
0
0
21 Apr 2023
Language Model Classifier Aligns Better with Physician Word Sensitivity than XGBoost on Readmission Prediction
Grace Yang
Mingzi Cao
L. Jiang
Xujin C. Liu
Alexander T. M. Cheung
Hannah Weiss
Davied Kurland
Kyunghyun Cho
Eric K. Oermann
LM&MA
24
3
0
13 Nov 2022
MaxMatch-Dropout: Subword Regularization for WordPiece
Tatsuya Hiraoka
54
8
0
09 Sep 2022
pNLP-Mixer: an Efficient all-MLP Architecture for Language
Francesco Fusco
Damian Pascual
Peter W. J. Staar
Diego Antognini
37
29
0
09 Feb 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
34
143
0
20 Dec 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
39
5
0
16 Oct 2021
1