ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.07523
14
0
v1v2 (latest)

Towards Large Language Models with Self-Consistent Natural Language Explanations

9 June 2025
Sahar Admoni
Ofra Amir
Assaf Hallak
Yftah Ziser
    LRM
ArXiv (abs)PDFHTML
Main:8 Pages
6 Figures
Bibliography:2 Pages
6 Tables
Appendix:3 Pages
Abstract

Large language models (LLMs) seem to offer an easy path to interpretability: just ask them to explain their decisions. Yet, studies show that these post-hoc explanations often misrepresent the true decision process, as revealed by mismatches in feature importance. Despite growing evidence of this inconsistency, no systematic solutions have emerged, partly due to the high cost of estimating feature importance, which limits evaluations to small datasets. To address this, we introduce the Post-hoc Self-Consistency Bank (PSCB) - a large-scale benchmark of decisions spanning diverse tasks and models, each paired with LLM-generated explanations and corresponding feature importance scores. Analysis of PSCB reveals that self-consistency scores barely differ between correct and incorrect predictions. We also show that the standard metric fails to meaningfully distinguish between explanations. To overcome this limitation, we propose an alternative metric that more effectively captures variation in explanation quality. We use it to fine-tune LLMs via Direct Preference Optimization (DPO), leading to significantly better alignment between explanations and decision-relevant features, even under domain shift. Our findings point to a scalable path toward more trustworthy, self-consistent LLMs.

View on arXiv
@article{admoni2025_2506.07523,
  title={ Towards Large Language Models with Self-Consistent Natural Language Explanations },
  author={ Sahar Admoni and Ofra Amir and Assaf Hallak and Yftah Ziser },
  journal={arXiv preprint arXiv:2506.07523},
  year={ 2025 }
}
Comments on this paper