Large Language Models (LLMs) are expected to be predictable and trustworthy to support reliable decision-making systems. Yet current LLMs often show inconsistencies in their judgments. In this work, we examine logical preference consistency as a foundational requirement for building more dependable LLM systems, ensuring stable and coherent decision-making while minimizing erratic or contradictory outputs. To quantify the logical preference consistency, we propose a universal evaluation framework based on three fundamental properties: transitivity, commutativity and negation invariance. Through extensive experimentation across diverse LLMs, we demonstrate that these properties serve as strong indicators of judgment robustness. Furthermore, we introduce a data refinement and augmentation technique, REPAIR, that enhances logical consistency while maintaining alignment with human preferences. Finally, we show that improving consistency leads to better performance in LLM-driven logic-based algorithms, reinforcing stability and coherence in decision-making systems.
View on arXiv@article{liu2025_2410.02205, title={ Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models }, author={ Yinhong Liu and Zhijiang Guo and Tianya Liang and Ehsan Shareghi and Ivan Vulić and Nigel Collier }, journal={arXiv preprint arXiv:2410.02205}, year={ 2025 } }