ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.07440
36
2

Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric

10 April 2025
Yixin Cao
Jiahao Ying
Y. Wang
Xipeng Qiu
Xuanjing Huang
Yugang Jiang
    ELM
ArXivPDFHTML
Abstract

Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications, yet current evaluation methods struggle to keep pace with their rapid development. One core challenge of evaluation in the large language model (LLM) era is the generalization issue: how to infer a model's near-unbounded abilities from inevitably bounded benchmarks. We address this challenge by proposing Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores. MUI quantifies the effort a model expends on a task, defined as the proportion of activated neurons or features during inference. Intuitively, a truly capable model should achieve higher performance with lower effort. Extensive experiments across popular LLMs reveal a consistent inverse logarithmic relationship between MUI and performance, which we formulate as the Utility Law. From this law we derive four practical corollaries that (i) guide training diagnostics, (ii) expose data contamination issue, (iii) enable fairer model comparisons, and (iv) design model-specific dataset diversity. Our code can be found atthis https URL.

View on arXiv
@article{cao2025_2504.07440,
  title={ Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric },
  author={ Yixin Cao and Jiahao Ying and Yaoning Wang and Xipeng Qiu and Xuanjing Huang and Yugang Jiang },
  journal={arXiv preprint arXiv:2504.07440},
  year={ 2025 }
}
Comments on this paper