Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.14408
Cited By
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
22 April 2024
Kevin Slagle
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpaceByte: Towards Deleting Tokenization from Large Language Modeling"
6 / 6 papers shown
Title
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
37
2
0
28 Oct 2024
Local Byte Fusion for Neural Machine Translation
Makesh Narsimhan Sreedhar
Xiangpeng Wan
Yu-Jie Cheng
Junjie Hu
27
4
0
23 May 2022
Consistent Accelerated Inference via Confident Adaptive Transformers
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
184
69
0
18 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
253
1,989
0
31 Dec 2020
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
230
89
0
31 Dec 2020
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
Hicham El Boukkouri
Olivier Ferret
Thomas Lavergne
Hiroshi Noji
Pierre Zweigenbaum
Junichi Tsujii
71
156
0
20 Oct 2020
1