ReviewEval: An Evaluation Framework for AI-Generated Reviews

17 February 2025

Abstract

The escalating volume of academic research, coupled with a shortage of qualified reviewers, necessitates innovative approaches to peer review. While large language model (LLMs) offer potential for automating this process, their current limitations include superficial critiques, hallucinations, and a lack of actionable insights. This research addresses these challenges by introducing a comprehensive evaluation framework for AI-generated reviews, that measures alignment with human evaluations, verifies factual accuracy, assesses analytical depth, and identifies actionable insights. We also propose a novel alignment mechanism that tailors LLM-generated reviews to the unique evaluation priorities of individual conferences and journals. To enhance the quality of these reviews, we introduce a self-refinement loop that iteratively optimizes the LLM's review prompts. Our framework establishes standardized metrics for evaluating AI-based review systems, thereby bolstering the reliability of AI-generated reviews in academic research.

View on arXiv

@article{kirtani2025_2502.11736,
  title={ ReviewEval: An Evaluation Framework for AI-Generated Reviews },
  author={ Chhavi Kirtani and Madhav Krishan Garg and Tejash Prasad and Tanmay Singhal and Murari Mandal and Dhruv Kumar },
  journal={arXiv preprint arXiv:2502.11736},
  year={ 2025 }
}

Comments on this paper