The capabilities of large language models (LLMs) have been enhanced by training on data that reflects human thought processes, such as the Chain-of-Thought format. However, evidence suggests that the conventional scheme of next-word prediction may not fully capture how humans learn to think. Inspired by how humans generalize mathematical reasoning, we propose a new approach named ClozeMath to fine-tune LLMs for mathematical reasoning. Our ClozeMath involves a text-infilling task that predicts masked equations from a given solution, analogous to cloze exercises used in human learning. Experiments on GSM8K, MATH, and GSM-Symbolic show that ClozeMath surpasses the strong baseline Masked Thought in performance and robustness, with two test-time scaling decoding algorithms, Beam Search and Chain-of-Thought decoding. Additionally, we conduct an ablation study to analyze the effects of various architectural and implementation choices on our approach.
View on arXiv@article{pham2025_2506.03763, title={ ClozeMath: Improving Mathematical Reasoning in Language Models by Learning to Fill Equations }, author={ Quang Hieu Pham and Thuy Duong Nguyen and Tung Pham and Anh Tuan Luu and Dat Quoc Nguyen }, journal={arXiv preprint arXiv:2506.03763}, year={ 2025 } }