ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.11858
49
1

OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs

14 March 2025
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
    ELM
ArXivPDFHTML
Abstract

Large Language Models (LLMs) have demonstrated great potential as evaluators of NLG systems, allowing for high-quality, reference-free, and multi-aspect assessments. However, existing LLM-based metrics suffer from two major drawbacks: reliance on proprietary models to generate training data or perform evaluations, and a lack of fine-grained, explanatory feedback. In this paper, we introduce OpeNLGauge, a fully open-source, reference-free NLG evaluation metric that provides accurate explanations based on error spans. OpeNLGauge is available as a two-stage ensemble of larger open-weight LLMs, or as a small fine-tuned evaluation model, with confirmed generalizability to unseen tasks, domains and aspects. Our extensive meta-evaluation shows that OpeNLGauge achieves competitive correlation with human judgments, outperforming state-of-the-art models on certain tasks while maintaining full reproducibility and providing explanations more than twice as accurate.

View on arXiv
@article{kartáč2025_2503.11858,
  title={ OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs },
  author={ Ivan Kartáč and Mateusz Lango and Ondřej Dušek },
  journal={arXiv preprint arXiv:2503.11858},
  year={ 2025 }
}
Comments on this paper