ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.18646
148
0
v1v2 (latest)

Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap

26 August 2025
Jun Wang
Ninglun Gu
Kailai Zhang
Zijiao Zhang
Yelun Bao
Jin Yang
Xu-cheng Yin
Liwei Liu
Yihuan Liu
Pengyong Li
Gary G. Yen
Junchi Yan
    ALMELM
ArXiv (abs)PDFHTML
Main:11 Pages
4 Figures
10 Tables
Appendix:12 Pages
Abstract

For Large Language Models (LLMs), a disconnect persists between benchmark performance and real-world utility. Current evaluation frameworks remain fragmented, prioritizing technical metrics while neglecting holistic assessment for deployment. This survey introduces an anthropomorphic evaluation paradigm through the lens of human intelligence, proposing a novel three-dimensional taxonomy: Intelligence Quotient (IQ)-General Intelligence for foundational capacity, Emotional Quotient (EQ)-Alignment Ability for value-based interactions, and Professional Quotient (PQ)-Professional Expertise for specialized proficiency. For practical value, we pioneer a Value-oriented Evaluation (VQ) framework assessing economic viability, social impact, ethical alignment, and environmental sustainability. Our modular architecture integrates six components with an implementation roadmap. Through analysis of 200+ benchmarks, we identify key challenges including dynamic assessment needs and interpretability gaps. It provides actionable guidance for developing LLMs that are technically proficient, contextually relevant, and ethically sound. We maintain a curated repository of open-source evaluation resources at:this https URL.

View on arXiv
Comments on this paper