v1v2 (latest)

HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation

29 April 2025

Cristina Garbacea

Chenhao Tan

ArXiv (abs)PDF HTML

Main:13 Pages

2 Figures

27 Tables

Appendix:28 Pages

Abstract

Alignment algorithms are widely used to align large language models (LLMs) to human users based on preference annotations. Typically these (often divergent) preferences are aggregated over a diverse set of users, resulting in fine-tuned models that are aligned to the ``average-user'' preference. Nevertheless, current models are used by individual users in very specific contexts and situations, emphasizing the need for user-dependent preference control. In this work we address the problem of personalizing LLM outputs to their users. We aim to generate customized responses tailored to specific individuals instead of generic outputs that emulate the collective voices of diverse populations. We propose HyPerAlign, an interpretable and sample-efficient hypothesis-driven personalization approach for LLM models. Given few-shot examples written by a particular user, we first infer hypotheses about their communication strategies, personality, and writing style, then prompt LLM models with these hypotheses and user-specific attributes to generate customized outputs. We conduct experiments on two different personalization tasks, namely authorship attribution and deliberative alignment, with datasets from diverse domains (news articles, blog posts, emails, jailbreaking benchmarks). Results demonstrate the superiority of hypothesis-driven LLM personalization compared to preference-based fine-tuning methods. For authorship attribution, HyPerAlign generations have consistently high win-rates (commonly $> 90\%$ ) against state-of-the-art preference fine-tuning approaches across diverse user profiles and LLM models. For deliberative alignment, the helpfulness of LLM models is improved by up to $70\%$ on average. Overall, HyPerAlign represents an interpretable and sample-efficient strategy for the personalization of LLM models to individual users.

View on arXiv

@article{garbacea2025_2505.00038,
  title={ HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation },
  author={ Cristina Garbacea and Chenhao Tan },
  journal={arXiv preprint arXiv:2505.00038},
  year={ 2025 }
}

Comments on this paper