ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.09296
  4. Cited By
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
v1v2v3 (latest)

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

15 June 2023
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
Xin Lv
Hao Peng
Zijun Yao
Xiaohan Zhang
Hanming Li
Chun-yan Li
Zheyuan Zhang
Yushi Bai
Yantao Liu
Amy Xin
Nianyi Lin
Kaifeng Yun
Linlu Gong
Jianhui Chen
Zhili Wu
Yunjia Qi
Weikai Li
Yong Guan
Kaisheng Zeng
Ji Qi
Hailong Jin
Jinxin Liu
Yu Gu
Yuan Yao
Ning Ding
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
    ELMALM
ArXiv (abs)PDFHTML

Papers citing "KoLA: Carefully Benchmarking World Knowledge of Large Language Models"

50 / 56 papers shown
Title
HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions
HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions
Dor Tsur
Carol Xuan Long
C. M. Verdun
Hsiang Hsu
Chen
Haim Permuter
Sajani Vithana
Flavio du Pin Calmon
WaLM
32
0
0
06 Jun 2025
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
Bowen Jiang
Runchuan Zhu
Jiang Wu
Zinco Jiang
Yifan He
...
Haote Yang
Songyang Zhang
Dahua Lin
Lijun Wu
Conghui He
ELM
56
0
0
22 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALMELM
261
7
0
26 Apr 2025
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
Zhangcheng Qiang
Kerry Taylor
Weiqing Wang
Jing Jiang
101
0
0
25 Mar 2025
LangBridge: Interpreting Image as a Combination of Language Embeddings
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao
Yuwei Niu
Fanqing Meng
Hao Li
Changyao Tian
...
Dianqi Li
X. Zhu
Li Yuan
Jifeng Dai
Yu Cheng
MLLM
152
1
0
25 Mar 2025
Trinity: A Modular Humanoid Robot AI System
Jingkai Sun
Qiang Zhang
Gang Han
Wen Zhao
Zhe Yong
Yan He
Jiaxu Wang
Jiahang Cao
Yijie Guo
Renjing Xu
79
1
0
11 Mar 2025
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Yuwei Niu
Munan Ning
Mengren Zheng
Weiyang Jin
Bin Lin
...
Jiaqi Liao
Chaoran Feng
Kunpeng Ning
Bin Zhu
Li Yuan
EGVM
156
26
0
10 Mar 2025
LLMs as Repositories of Factual Knowledge: Limitations and Solutions
Seyed Mahed Mousavi
Simone Alghisi
Giuseppe Riccardi
KELM
107
1
0
22 Jan 2025
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge
  Neurons in Large Language Models
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models
Pengfei Cao
Yuheng Chen
Zhuoran Jin
Yubo Chen
Kang Liu
Jun Zhao
KELM
117
0
0
26 Nov 2024
A Novel Psychometrics-Based Approach to Developing Professional
  Competency Benchmark for Large Language Models
A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models
Elena Kardanova
Alina Ivanova
Ksenia Tarasova
Taras Pashchenko
Aleksei Tikhoniuk
Elen Yusupova
Anatoly Kasprzhak
Yaroslav Kuzminov
Ekaterina Kruchinskaia
Irina Brun
132
1
0
29 Oct 2024
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with
  Annual Updates
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
106
4
0
28 Oct 2024
Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies
Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies
Jiajie Yu
Yuhong Wang
Wei Ma
OffRL
175
2
0
14 Oct 2024
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
Mathis Pink
Vy A. Vo
Qinyuan Wu
Jianing Mu
Javier S. Turek
Uri Hasson
K. A. Norman
Sebastian Michelmann
Alexander G. Huth
Mariya Toneva
106
2
0
10 Oct 2024
Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives
Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives
Zhihu Wang
Shiwan Zhao
Yu Wang
Heyuan Huang
Sitao Xie
Y. Zhang
Jiaxin Shi
Zhixing Wang
H. Li
Junchi Yan
LRM
124
6
0
13 Aug 2024
Structure-aware Domain Knowledge Injection for Large Language Models
Structure-aware Domain Knowledge Injection for Large Language Models
Kai-Chun Liu
Ze Chen
Zhihang Fu
Rongxin Jiang
Fan Zhou
Yao-Shen Chen
Yue-bo Wu
Yue Wu
Jieping Ye
87
1
0
23 Jul 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
143
39
0
22 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
184
31
0
28 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELMALM
166
8
0
20 Jun 2024
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language
  Models
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Zhuoran Jin
Pengfei Cao
Chenhao Wang
Zhitao He
Hongbang Yuan
Jiachun Li
Yubo Chen
Kang Liu
Jun Zhao
KELMMU
137
26
0
16 Jun 2024
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Rithesh Murthy
Liangwei Yang
Juntao Tan
Tulika Awalgaonkar
Yilun Zhou
...
Zuxin Liu
Ming Zhu
Huan Wang
Caiming Xiong
Silvio Savarese
104
6
0
12 Jun 2024
SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with
  Superposition of Multi Token Embeddings
SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings
MohammadAli SadraeiJavaeri
Ehsaneddin Asgari
A. Mchardy
Hamid R. Rabiee
VLMAAML
73
0
0
07 Jun 2024
DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase
  for Math Reasoning
DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning
Shangqing Tu
Kejian Zhu
Yushi Bai
Zijun Yao
Lei Hou
Juanzi Li
107
7
0
06 Jun 2024
PertEval: Unveiling Real Knowledge Capacity of LLMs with
  Knowledge-Invariant Perturbations
PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations
Jiatong Li
Renjun Hu
Kunzhe Huang
Zhuang Yan
Qi Liu
Mengxiao Zhu
Xing Shi
Wei Lin
KELM
110
8
0
30 May 2024
Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
Jiaqing Yuan
Lin Pan
Chung-Wei Hang
Jiang Guo
Jiarong Jiang
Bonan Min
Patrick Ng
Zhiguo Wang
HILMELM
87
4
0
24 Apr 2024
PRobELM: Plausibility Ranking Evaluation for Language Models
PRobELM: Plausibility Ranking Evaluation for Language Models
Moy Yuan
Chenxi Whitehouse
Eric Chamoun
Rami Aly
Andreas Vlachos
189
5
0
04 Apr 2024
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept,
  Taxonomy, and Methods
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
Yuji Cao
Huan Zhao
Yuheng Cheng
Ting Shu
Guolong Liu
Gaoqi Liang
Junhua Zhao
Yun Li
LLMAGKELMOffRLLM&Ro
137
71
0
30 Mar 2024
MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited
  Multimodal Senses and Physical Needs
MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs
Xianhao Yu
Jiaqi Fu
Renjia Deng
Wenjuan Han
111
6
0
28 Mar 2024
Leveraging Large Language Models for Fuzzy String Matching in Political
  Science
Leveraging Large Language Models for Fuzzy String Matching in Political Science
Yu Wang
54
0
0
27 Mar 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A
  Multifaceted Statistical Approach
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach
Kun Sun
Rong Wang
Anders Sogaard
52
3
0
22 Mar 2024
ERBench: An Entity-Relationship based Automatically Verifiable
  Hallucination Benchmark for Large Language Models
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
Jio Oh
Soyeon Kim
Junseok Seo
Jindong Wang
Ruochen Xu
Xing Xie
Steven Euijong Whang
79
4
0
08 Mar 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
301
22
0
28 Feb 2024
Benchmarking Knowledge Boundary for Large Language Models: A Different
  Perspective on Model Evaluation
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation
Xunjian Yin
Xu Zhang
Jie Ruan
Xiaojun Wan
ELM
112
24
0
18 Feb 2024
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models
Yougang Lyu
Lingyong Yan
Shuaiqiang Wang
Haibo Shi
D. Yin
Fajie Yuan
Zhumin Chen
Maarten de Rijke
Zhaochun Ren
82
7
0
17 Feb 2024
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Jiaheng Wei
Yuanshun Yao
Jean-François Ton
Hongyi Guo
Andrew Estornell
Yang Liu
HILM
137
26
0
16 Feb 2024
Inadequacies of Large Language Model Benchmarks in the Era of Generative
  Artificial Intelligence
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
ALMELM
108
58
0
15 Feb 2024
How Proficient Are Large Language Models in Formal Languages? An
  In-Depth Insight for Knowledge Base Question Answering
How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering
Jinxi Liu
S. Cao
Jiaxin Shi
Tingjian Zhang
Lunyiu Nie
Linmei Hu
Lei Hou
Juanzi Li
ELM
67
4
0
11 Jan 2024
Supervised Knowledge Makes Large Language Models Better In-context
  Learners
Supervised Knowledge Makes Large Language Models Better In-context Learners
Linyi Yang
Shuibai Zhang
Zhuohao Yu
Guangsheng Bao
Yidong Wang
...
Ruochen Xu
Weirong Ye
Xing Xie
Weizhu Chen
Yue Zhang
152
19
0
26 Dec 2023
Climate Change from Large Language Models
Climate Change from Large Language Models
Hongyin Zhu
Prayag Tiwari
ELM
72
7
0
19 Dec 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
  Catching up?
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Shafiq Joty
ELMCLLAI4MHLRMALM
154
28
0
28 Nov 2023
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models
  via Unconstrained Generation
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
Xun Liang
Shichao Song
Pengnian Qi
Zhiyu Li
Feiyu Xiong
...
Zhaohui Wy
Dawei He
Peng Cheng
Zhonghao Wang
Haiying Deng
HILM
84
22
0
26 Nov 2023
When does In-context Learning Fall Short and Why? A Study on
  Specification-Heavy Tasks
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng
Xiaozhi Wang
Jianhui Chen
Weikai Li
Yunjia Qi
...
Zhili Wu
Kaisheng Zeng
Bin Xu
Lei Hou
Juanzi Li
92
34
0
15 Nov 2023
Insights into Classifying and Mitigating LLMs' Hallucinations
Insights into Classifying and Mitigating LLMs' Hallucinations
Alessandro Bruno
P. Mazzeo
Aladine Chetouani
Marouane Tliba
M. A. Kerkouri
HILM
98
11
0
14 Nov 2023
PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain
PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain
Wei-wei Zhu
Xiaoling Wang
Huanran Zheng
Mosha Chen
Buzhou Tang
ELMLM&MA
69
36
0
22 Oct 2023
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large
  Language Models
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
Yuyang Bai
Shangbin Feng
Vidhisha Balachandran
Zhaoxuan Tan
Shiqi Lou
Tianxing He
Yulia Tsvetkov
ELM
95
3
0
15 Oct 2023
Resolving Knowledge Conflicts in Large Language Models
Resolving Knowledge Conflicts in Large Language Models
Yike Wang
Shangbin Feng
Heng Wang
Weijia Shi
Vidhisha Balachandran
Tianxing He
Yulia Tsvetkov
108
20
0
02 Oct 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELMAILaw
134
46
0
28 Sep 2023
Can LLM-Generated Misinformation Be Detected?
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
197
182
0
25 Sep 2023
ProtoEM: A Prototype-Enhanced Matching Framework for Event Relation
  Extraction
ProtoEM: A Prototype-Enhanced Matching Framework for Event Relation Extraction
Zhilei Hu
Zixuan Li
Daozhu Xu
Long Bai
Cheng Jin
Xiaolong Jin
Jiafeng Guo
Xueqi Cheng
59
5
0
22 Sep 2023
Can Large Language Models Understand Real-World Complex Instructions?
Can Large Language Models Understand Real-World Complex Instructions?
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALMLRMELM
155
59
0
17 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
126
81
0
13 Sep 2023
12
Next