Training LLMs over Neurally Compressed Text

Training LLMs over Neurally Compressed Text

4 April 2024

Jeffrey Pennington

Jascha Narain Sohl-Dickstein

Papers citing "Training LLMs over Neurally Compressed Text"

13 / 13 papers shown

Title
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs Sumin An Junyoung Sung Wonpyo Park Chanjun Park Paul Hongsuck Seo 97 0 0 10 Feb 2025
L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression J. Zhang Zhengxue Cheng Yan Zhao Shihao Wang Dajiang Zhou Guo Lu Li-Na Song 81 1 0 21 Dec 2024
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization Vivek Ramanujan Kushal Tirumala Armen Aghajanyan Luke Zettlemoyer Ali Farhadi DiffM 74 2 0 20 Dec 2024
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data David Heurtel-Depeiges Anian Ruoss Joel Veness Tim Genewein 25 1 0 07 Oct 2024
Compressed-Language Models for Understanding Compressed File Formats: a JPEG Exploration Juan C. Pérez Alejandro Pardo Mattia Soldan Hani Itani Juan Carlos León Alcázar Guohao Li 16 2 0 27 May 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling Kevin Slagle 37 3 0 22 Apr 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance Omer Goldman Avi Caciularu Matan Eyal Kris Cao Idan Szpektor Reut Tsarfaty 48 22 0 10 Mar 2024
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models Luke Vilnis Yury Zemlyanskiy Patrick C. Murray Alexandre Passos Sumit Sanghai 59 9 0 18 Oct 2022
Sequence Length is a Domain: Length-based Overfitting in Transformer Models Dusan Varis Ondrej Bojar 51 57 0 15 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 695 0 27 Aug 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 256 1,996 0 31 Dec 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 282 2,015 0 28 Jul 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 246 4,489 0 23 Jan 2020