ClozeMath: Improving Mathematical Reasoning in Language Models by Learning to Fill Equations

4 June 2025

Quang Hieu Pham

Main:5 Pages

6 Figures

Bibliography:2 Pages

4 Tables

Appendix:1 Pages

Abstract

The capabilities of large language models (LLMs) have been enhanced by training on data that reflects human thought processes, such as the Chain-of-Thought format. However, evidence suggests that the conventional scheme of next-word prediction may not fully capture how humans learn to think. Inspired by how humans generalize mathematical reasoning, we propose a new approach named ClozeMath to fine-tune LLMs for mathematical reasoning. Our ClozeMath involves a text-infilling task that predicts masked equations from a given solution, analogous to cloze exercises used in human learning. Experiments on GSM8K, MATH, and GSM-Symbolic show that ClozeMath surpasses the strong baseline Masked Thought in performance and robustness, with two test-time scaling decoding algorithms, Beam Search and Chain-of-Thought decoding. Additionally, we conduct an ablation study to analyze the effects of various architectural and implementation choices on our approach.

View on arXiv

@article{pham2025_2506.03763,
  title={ ClozeMath: Improving Mathematical Reasoning in Language Models by Learning to Fill Equations },
  author={ Quang Hieu Pham and Thuy Duong Nguyen and Tung Pham and Anh Tuan Luu and Dat Quoc Nguyen },
  journal={arXiv preprint arXiv:2506.03763},
  year={ 2025 }
}

Comments on this paper