Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction

17 June 2025

Marija Šakota

Robert West

ArXiv (abs)PDF HTML

Main:8 Pages

3 Figures

Bibliography:3 Pages

6 Tables

Appendix:5 Pages

Abstract

Many recent approaches to structured NLP tasks use an autoregressive language model $M$ to map unstructured input text $x$ to output text $y$ representing structured objects (such as tuples, lists, trees, code, etc.), where the desired output structure is enforced via constrained decoding. During training, these approaches do not require the model to be aware of the constraints, which are merely implicit in the training outputs $y$ . This is advantageous as it allows for dynamic constraints without requiring retraining, but can lead to low-quality output during constrained decoding at test time. We overcome this problem with Boosted Constrained Decoding (BoostCD), which combines constrained and unconstrained decoding in two phases: Phase 1 decodes from the base model $M$ twice, in constrained and unconstrained mode, obtaining two weak predictions. In phase 2, a learned autoregressive boosted model combines the two weak predictions into one final prediction. The mistakes made by the base model with vs. without constraints tend to be complementary, which the boosted model learns to exploit for improved performance. We demonstrate the power of BoostCD by applying it to closed information extraction. Our model, BoostIE, outperforms prior approaches both in and out of distribution, addressing several common errors identified in those approaches.

View on arXiv

@article{šakota2025_2506.14901,
  title={ Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction },
  author={ Marija Šakota and Robert West },
  journal={arXiv preprint arXiv:2506.14901},
  year={ 2025 }
}

Comments on this paper