ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.09296
116
69
v1v2v3 (latest)

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

15 June 2023
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
Xin Lv
Hao Peng
Zijun Yao
Xiaohan Zhang
Hanming Li
Chun-yan Li
Zheyuan Zhang
Yushi Bai
Yantao Liu
Amy Xin
Nianyi Lin
Kaifeng Yun
Linlu Gong
Jianhui Chen
Zhili Wu
Yunjia Qi
Weikai Li
Yong Guan
Kaisheng Zeng
Ji Qi
Hailong Jin
Jinxin Liu
Yu Gu
Yuan Yao
Ning Ding
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
    ELMALM
ArXiv (abs)PDFHTML
Abstract

The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For ability modeling, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering 191919 tasks. (2) For data, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For evaluation criteria, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge hallucination. We evaluate 212121 open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems.

View on arXiv
Comments on this paper