SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling

18 June 2025

Md Imbesat Hassan Rizvi

Xiaodan Zhu

Iryna Gurevych

Author Contacts:

xiaodan.zhu@queensu.ca

LRM

ArXiv (abs)PDF HTML

Main:7 Pages

5 Figures

Bibliography:3 Pages

11 Tables

Appendix:4 Pages

Abstract

Process or step-wise supervision has played a crucial role in advancing complex multi-step reasoning capabilities of Large Language Models (LLMs). However, efficient, high-quality automated process annotation remains a significant challenge. To address this, we introduce Single-Pass Annotation with Reference-Guided Evaluation (SPARE), a novel structured framework that enables single-pass, per-step annotation by aligning each solution step to one or multiple steps in a reference solution, accompanied by explicit reasoning for evaluation. We show that reference-guided step-level evaluation effectively facilitates process supervision on four datasets spanning three domains: mathematical reasoning, multi-hop compositional question answering, and spatial reasoning. We demonstrate that SPARE, when compared to baselines, improves reasoning performance when used for: (1) fine-tuning models in an offline RL setup for inference-time greedy-decoding, and (2) training reward models for ranking/aggregating multiple LLM-generated outputs. Additionally, SPARE achieves competitive performance on challenging mathematical datasets while offering 2.6 times greater efficiency, requiring only 38% of the runtime, compared to tree search-based automatic annotation. The codebase, along with a trained SPARE-PRM model, is publicly released to facilitate further research and reproducibility.

View on arXiv

@article{rizvi2025_2506.15498,
  title={ SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling },
  author={ Md Imbesat Hassan Rizvi and Xiaodan Zhu and Iryna Gurevych },
  journal={arXiv preprint arXiv:2506.15498},
  year={ 2025 }
}

Comments on this paper