ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.06298
16
0

Pairwise Calibrated Rewards for Pluralistic Alignment

17 May 2025
Daniel Halpern
Evi Micha
Ariel D. Procaccia
Itai Shapira
ArXiv (abs)PDFHTML
4 Figures
5 Tables
Appendix:27 Pages
Abstract

Current alignment pipelines presume a single, universal notion of desirable behavior. However, human preferences often diverge across users, contexts, and cultures. As a result, disagreement collapses into the majority signal and minority perspectives are discounted. To address this, we propose reflecting diverse human preferences through a distribution over multiple reward functions, each inducing a distinct aligned policy. The distribution is learned directly from pairwise preference without annotator identifiers or predefined groups. Instead, annotator disagreements are treated as informative soft labels. Our central criterion is pairwise calibration: for every pair of candidate responses, the proportion of reward functions preferring one response matches the fraction of annotators with that preference. We prove that even a small outlier-free ensemble can accurately represent diverse preference distributions. Empirically, we introduce and validate a practical training heuristic to learn such ensembles, and demonstrate its effectiveness through improved calibration, implying a more faithful representation of pluralistic values.

View on arXiv
@article{halpern2025_2506.06298,
  title={ Pairwise Calibrated Rewards for Pluralistic Alignment },
  author={ Daniel Halpern and Evi Micha and Ariel D. Procaccia and Itai Shapira },
  journal={arXiv preprint arXiv:2506.06298},
  year={ 2025 }
}
Comments on this paper