Configurable Preference Tuning with Rubric-Guided Synthetic Data

13 June 2025

Víctor Gallego

ArXiv (abs)PDF HTML

Main:5 Pages

1 Figures

Bibliography:1 Pages

6 Tables

Appendix:3 Pages

Abstract

Models of human feedback for AI alignment, such as those underpinning Direct Preference Optimization (DPO), often bake in a singular, static set of preferences, limiting adaptability. This paper challenges the assumption of monolithic preferences by introducing Configurable Preference Tuning (CPT), a novel framework for endowing language models with the ability to dynamically adjust their behavior based on explicit, human-interpretable directives. CPT leverages synthetically generated preference data, conditioned on system prompts derived from structured, fine-grained rubrics that define desired attributes like writing style. By fine-tuning with these rubric-guided preferences, the LLM learns to modulate its outputs at inference time in response to the system prompt, without retraining. This approach not only offers fine-grained control but also provides a mechanism for modeling more nuanced and context-dependent human feedback. Several experimental artifacts, such as training code, generated datasets and fine-tuned models are released atthis https URL

View on arXiv

@article{gallego2025_2506.11702,
  title={ Configurable Preference Tuning with Rubric-Guided Synthetic Data },
  author={ Víctor Gallego },
  journal={arXiv preprint arXiv:2506.11702},
  year={ 2025 }
}

Comments on this paper