v1v2 (latest)

Structured Pruning for Diverse Best-of-N Reasoning Optimization

4 June 2025

Main:4 Pages

6 Figures

Bibliography:3 Pages

4 Tables

Appendix:5 Pages

Abstract

Model pruning in transformer-based language models, traditionally viewed as a means of achieving computational savings, can enhance the model's reasoning capabilities. In this work, we uncover a surprising phenomenon: the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, SPRINT identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments demonstrate that our method significantly outperforms traditional best-of- $N$ and random head selection strategies on the MATH500 and GSM8K datasets.

View on arXiv

@article{nguyen2025_2506.03978,
  title={ Structured Pruning for Diverse Best-of-N Reasoning Optimization },
  author={ Hieu Trung Nguyen and Bao Nguyen and Viet Anh Nguyen },
  journal={arXiv preprint arXiv:2506.03978},
  year={ 2025 }
}

Comments on this paper