Free(): Learning to Forget in Malloc-Only Reasoning Models
Yilun Zheng
Dongyang Ma
Tian Liang
Jiahao Xu
Xinting Huang
Lijie Chen
Haitao Mi
Yan Wang
- ReLMLRM
Main:9 Pages
7 Figures
Bibliography:3 Pages
4 Tables
Appendix:2 Pages
Abstract
Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state.
View on arXivComments on this paper
