Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.07185
Cited By
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
12 May 2023
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers"
50 / 77 papers shown
Title
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning
Shaurya Sharthak
Vinayak Pahalwan
Adithya Kamath
Adarsh Shirawalmath
CLL
VLM
45
0
0
14 May 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
52
2
0
11 Apr 2025
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
160
1
0
03 Apr 2025
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz
Hendra Setiawan
Stephan Peitz
Yova Kementchedjhieva
43
0
0
02 Apr 2025
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer
Ivan Vulić
E. Ponti
151
0
0
25 Mar 2025
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
45
3
0
17 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
68
0
0
17 Mar 2025
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
82
0
0
03 Feb 2025
PixelWorld: Towards Perceiving Everything as Pixels
Zhiheng Lyu
Xueguang Ma
Wenhu Chen
145
0
0
31 Jan 2025
Scaling Particle Collision Data Analysis
Hengkui Wu
Panpan Chi
Yongfeng Zhu
Liujiang Liu
Shuyang Hu
...
Yingsi Xin
Bruce Liu
Dahao Liang
Xiaojun Jia
Manqi Ruan
79
0
0
28 Nov 2024
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation
Langlin Huang
Mengyu Bu
Yang Feng
33
0
0
03 Nov 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
37
2
0
28 Oct 2024
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
Buu Phan
Brandon Amos
Itai Gat
Marton Havasi
Matthew Muckley
Karen Ullrich
47
1
0
11 Oct 2024
Egalitarian Language Representation in Language Models: It All Begins with Tokenizers
Menan Velayuthan
Kengatharaiyer Sarveswaran
40
5
0
17 Sep 2024
Adaptive Large Language Models By Layerwise Attention Shortcuts
Prateek Verma
Mert Pilanci
KELM
OffRL
55
0
0
17 Sep 2024
Where is the signal in tokenization space?
Renato Lui Geh
Honghua Zhang
Kareem Ahmed
Benjie Wang
Guy Van den Broeck
27
4
0
16 Aug 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Chaofan Tao
Qian Liu
Longxu Dou
Niklas Muennighoff
Zhongwei Wan
Ping Luo
Min-Bin Lin
Ngai Wong
PILM
60
46
0
18 Jul 2024
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao
Fandong Meng
Jie Zhou
49
1
0
17 Jul 2024
Genomic Language Models: Opportunities and Challenges
Gonzalo Benegas
Chengzhong Ye
C. Albors
Jianan Canal Li
Yun S. Song
AI4CE
LM&MA
ELM
43
18
0
16 Jul 2024
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Valentin Hoffman
Tomasz Limisiewicz
Yulia Tsvetkov
Noah A. Smith
51
4
0
11 Jul 2024
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory
L. Zancato
Arjun Seshadri
Yonatan Dukler
Aditya Golatkar
Yantao Shen
Benjamin Bowman
Matthew Trager
Alessandro Achille
Stefano Soatto
39
8
0
08 Jul 2024
Understanding and Mitigating Tokenization Bias in Language Models
Buu Phan
Marton Havasi
Matthew Muckley
Karen Ullrich
44
3
0
24 Jun 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
43
14
0
12 Jun 2024
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Namgyu Ho
Sangmin Bae
Taehyeon Kim
Hyunjik Jo
Yireun Kim
Tal Schuster
Adam Fisch
James Thorne
Se-Young Yun
45
8
0
04 Jun 2024
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Yongxin Zhu
Dan Su
Liqiang He
Linli Xu
Dong Yu
33
5
0
03 Jun 2024
Integrating Multi-scale Contextualized Information for Byte-based Neural Machine Translation
Langlin Huang
Yang Feng
33
1
0
29 May 2024
Compressed-Language Models for Understanding Compressed File Formats: a JPEG Exploration
Juan C. Pérez
Alejandro Pardo
Mattia Soldan
Hani Itani
Juan Carlos León Alcázar
Guohao Li
26
2
0
27 May 2024
Zero-Shot Tokenizer Transfer
Benjamin Minixhofer
E. Ponti
Ivan Vulić
VLM
44
9
0
13 May 2024
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Sander Land
Max Bartolo
31
21
0
08 May 2024
In-Context Learning with Long-Context Models: An In-Depth Exploration
Amanda Bertsch
Maor Ivgi
Uri Alon
Jonathan Berant
Matthew R. Gormley
Matthew R. Gormley
Graham Neubig
ReLM
AIMat
93
64
0
30 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
37
3
0
22 Apr 2024
Adaptive Patching for High-resolution Image Segmentation with Transformers
Enzhi Zhang
Isaac Lyngaas
Peng Chen
Xiao Wang
Jun Igarashi
Yuankai Huo
M. Wahib
M. Munetomo
MedIm
32
1
0
15 Apr 2024
Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment
Zhiqing Hong
Rongjie Huang
Xize Cheng
Yongqi Wang
Ruiqi Li
Fuming You
Zhou Zhao
Zhimeng Zhang
28
7
0
14 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
53
27
0
12 Apr 2024
On the Effect of (Near) Duplicate Subwords in Language Modelling
Anton Schäfer
Thomas Hofmann
Imanol Schlag
Tiago Pimentel
42
1
0
09 Apr 2024
Neural Multimodal Topic Modeling: A Comprehensive Evaluation
Felipe González-Pizarro
Giuseppe Carenini
VGen
45
1
0
26 Mar 2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
43
11
0
18 Mar 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
13
0
15 Mar 2024
Beyond Language Models: Byte Models are Digital World Simulators
Shangda Wu
Xu Tan
Zili Wang
Rui Wang
Xiaobing Li
Maosong Sun
35
12
0
29 Feb 2024
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
Johnathan Xie
Yoonho Lee
Annie S. Chen
Chelsea Finn
25
3
0
22 Feb 2024
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
Aaditya K. Singh
DJ Strouse
40
46
0
22 Feb 2024
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
35
28
0
20 Feb 2024
Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge
Julien Delile
Srayanta Mukherjee
Anton Van Pamel
Leonid Zhukov
RALM
21
18
0
19 Feb 2024
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron
Sertan Girgin
Mauro Verzetti
Damien Vincent
Matej Kastelic
...
Olivier Pietquin
Matthieu Geist
Léonard Hussenot
Neil Zeghidour
A. Agostinelli
39
17
0
06 Feb 2024
MambaByte: Token-free Selective State Space Model
Junxiong Wang
Tushaar Gangavarapu
Jing Nathan Yan
Alexander M. Rush
Mamba
44
35
0
24 Jan 2024
MaLA-500: Massive Language Adaptation of Large Language Models
Peiqin Lin
Shaoxiong Ji
Jörg Tiedemann
André F. T. Martins
Hinrich Schütze
ELM
31
15
0
24 Jan 2024
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery
Caleb Robinson
Isaac Corley
Anthony Ortiz
Rahul Dodhia
J. L. Ferres
Peyman Najafirad
19
0
0
12 Jan 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Yintao Tai
Xiyang Liao
Alessandro Suglia
Antonio Vergari
MLLM
26
7
0
06 Jan 2024
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
27
7
0
28 Nov 2023
1
2
Next