ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.16642
93
2
v1v2 (latest)

L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression

21 December 2024
Junxuan Zhang
Zhengxue Cheng
Yan Zhao
Shihao Wang
Dajiang Zhou
Guo Lu
Li Song
ArXiv (abs)PDFHTMLGithub (12★)
Main:7 Pages
7 Figures
Bibliography:2 Pages
12 Tables
Appendix:2 Pages
Abstract

Learning-based probabilistic models can be combined with an entropy coder for data compression. However, due to the high complexity of learning-based models, their practical application as text compressors has been largely overlooked. To address this issue, our work focuses on a low-complexity design while maintaining compression performance. We introduce a novel Learned Lossless Low-complexity Text Compression method (L3TC). Specifically, we conduct extensive experiments demonstrating that RWKV models achieve the fastest decoding speed with a moderate compression ratio, making it the most suitable backbone for our method. Second, we propose an outlier-aware tokenizer that uses a limited vocabulary to cover frequent tokens while allowing outliers to bypass the prediction and encoding. Third, we propose a novel high-rank reparameterization strategy that enhances the learning capability during training without increasing complexity during inference. Experimental results validate that our method achieves 48\% bit saving compared to gzip compressor. Besides, \emph{L3TC} offers compression performance comparable to other learned compressors, with a 50×50\times50× reduction in model parameters. More importantly, \emph{L3TC} is the fastest among all learned compressors, providing real-time decoding speeds up to megabytes per second.

View on arXiv
Comments on this paper