ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.01289
  4. Cited By
Greed is All You Need: An Evaluation of Tokenizer Inference Methods

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

2 March 2024
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
ArXivPDFHTML

Papers citing "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"

10 / 10 papers shown
Title
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer
Ivan Vulić
Edoardo Ponti
229
0
0
25 Mar 2025
Splintering Nonconcatenative Languages for Better Tokenization
Splintering Nonconcatenative Languages for Better Tokenization
Bar Gazit
Shaltiel Shmidman
Avi Shmidman
Yuval Pinter
64
0
0
18 Mar 2025
Tokenization is Sensitive to Language Variation
Tokenization is Sensitive to Language Variation
Anna Wegmann
Dong Nguyen
David Jurgens
86
1
0
24 Feb 2025
Hit the Sweet Spot! Span-Level Ensemble for Large Language Models
Hit the Sweet Spot! Span-Level Ensemble for Large Language Models
Yangyifan Xu
Jianghao Chen
Junhong Wu
Jiajun Zhang
MoE
36
2
0
27 Sep 2024
Zero-Shot Tokenizer Transfer
Zero-Shot Tokenizer Transfer
Benjamin Minixhofer
Edoardo Ponti
Ivan Vulić
VLM
49
9
0
13 May 2024
Evaluating Subword Tokenization: Alien Subword Composition and OOV
  Generalization Challenge
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
40
10
0
20 Apr 2024
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
34
2
0
30 Mar 2024
Different Tokenization Schemes Lead to Comparable Performance in Spanish
  Number Agreement
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement
Catherine Arnett
Pamela D. Rivière
Tyler A. Chang
Sean Trott
24
2
0
20 Mar 2024
Tokenization Is More Than Compression
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
61
28
0
28 Feb 2024
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhehuai Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,750
0
26 Sep 2016
1