LIVEJoin the current RTAI Connect sessionJoin now

78
1

SPHERE: An Evaluation Card for Human-AI Systems

Abstract

In the era of Large Language Models (LLMs), establishing effective evaluation methods and standards for diverse human-AI interaction systems is increasingly challenging. To encourage more transparent documentation and facilitate discussion on human-AI system evaluation design options, we present an evaluation card SPHERE, which encompasses five key dimensions: 1) What is being evaluated?; 2) How is the evaluation conducted?; 3) Who is participating in the evaluation?; 4) When is evaluation conducted?; 5) How is evaluation validated? We conduct a review of 39 human-AI systems using SPHERE, outlining current evaluation practices and areas for improvement. We provide three recommendations for improving the validity and rigor of evaluation practices.

View on arXiv
@article{ma2025_2504.07971,
  title={ SPHERE: An Evaluation Card for Human-AI Systems },
  author={ Qianou Ma and Dora Zhao and Xinran Zhao and Chenglei Si and Chenyang Yang and Ryan Louie and Ehud Reiter and Diyi Yang and Tongshuang Wu },
  journal={arXiv preprint arXiv:2504.07971},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.