Enhancing Systematic Reviews with Large Language Models: Using GPT-4 and Kimi

28 April 2025

Dandan Chen Kaptur

Yue Huang

Xuejun Ryan Ji

Yanhui Guo

Bradley Kaptur

ArXiv PDF HTML

Abstract

This research delved into GPT-4 and Kimi, two Large Language Models (LLMs), for systematic reviews. We evaluated their performance by comparing LLM-generated codes with human-generated codes from a peer-reviewed systematic review on assessment. Our findings suggested that the performance of LLMs fluctuates by data volume and question complexity for systematic reviews.

View on arXiv

@article{kaptur2025_2504.20276,
  title={ Enhancing Systematic Reviews with Large Language Models: Using GPT-4 and Kimi },
  author={ Dandan Chen Kaptur and Yue Huang and Xuejun Ryan Ji and Yanhui Guo and Bradley Kaptur },
  journal={arXiv preprint arXiv:2504.20276},
  year={ 2025 }
}

Comments on this paper