Free(): Learning to Forget in Malloc-Only Reasoning Models

8 February 2026

Yilun Zheng

Dongyang Ma

Tian Liang

Jiahao Xu

Xinting Huang

Lijie Chen

Haitao Mi

Yan Wang

ReLM

LRM

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)Github (17341★)

Main:9 Pages

7 Figures

Bibliography:3 Pages

4 Tables

Appendix:2 Pages

Abstract

Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state.

View on arXiv

Comments on this paper