10
0

Syntactic Control of Language Models by Posterior Inference

Main:9 Pages
8 Figures
Bibliography:4 Pages
4 Tables
Appendix:3 Pages
Abstract

Controlling the syntactic structure of text generated by language models is valuable for applications requiring clarity, stylistic consistency, or interpretability, yet it remains a challenging task. In this paper, we argue that sampling algorithms based on the posterior inference can effectively enforce a target constituency structure during generation. Our approach combines sequential Monte Carlo, which estimates the posterior distribution by sampling from a proposal distribution, with a syntactic tagger that ensures that each generated token aligns with the desired syntactic structure. Our experiments with GPT2 and Llama3-8B models show that with an appropriate proposal distribution, we can improve syntactic accuracy, increasing the F1 score from 12.3112.31 (GPT2-large) and 35.3335.33 (Llama3-8B) to about 9393 in both cases without compromising the language model's fluency. These results underscore both the complexity of syntactic control and the effectiveness of sampling algorithms, offering a promising approach for applications where precise control over syntax is essential.

View on arXiv
@article{xefteri2025_2506.07154,
  title={ Syntactic Control of Language Models by Posterior Inference },
  author={ Vicky Xefteri and Tim Vieira and Ryan Cotterell and Afra Amini },
  journal={arXiv preprint arXiv:2506.07154},
  year={ 2025 }
}
Comments on this paper