ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11747
  4. Cited By
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
  Language Models

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

19 May 2023
Junyi Li
Xiaoxue Cheng
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
    HILM
    VLM
ArXivPDFHTML

Papers citing "HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models"

50 / 161 papers shown
Title
Boosting Conversational Question Answering with Fine-Grained
  Retrieval-Augmentation and Self-Check
Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check
Linhao Ye
Zhikai Lei
Jia-Peng Yin
Qin Chen
Jie Zhou
Liang He
3DV
RALM
34
17
0
27 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
57
9
0
25 Mar 2024
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants
  in the Biomedical Domain
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
William James Bolton
Rafael Poyiadzi
Edward R. Morrell
Gabriela van Bergen Gonzalez Bueno
Lea Goetz
43
2
0
21 Mar 2024
Self-Attention Based Semantic Decomposition in Vector Symbolic
  Architectures
Self-Attention Based Semantic Decomposition in Vector Symbolic Architectures
Calvin Yeung
Prathyush P. Poduval
Mohsen Imani
34
1
0
20 Mar 2024
ERBench: An Entity-Relationship based Automatically Verifiable
  Hallucination Benchmark for Large Language Models
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
Jio Oh
Soyeon Kim
Junseok Seo
Jindong Wang
Ruochen Xu
Xing Xie
Steven Euijong Whang
41
1
0
08 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
54
144
0
05 Mar 2024
SciAssess: Benchmarking LLM Proficiency in Scientific Literature
  Analysis
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
Hengxing Cai
Xiaochen Cai
Junhan Chang
Sihang Li
Lin Yao
...
Changhong Chen
Zheng Cheng
Zifeng Zhao
Linfeng Zhang
Guolin Ke
ELM
36
24
0
04 Mar 2024
Editing Factual Knowledge and Explanatory Ability of Medical Large
  Language Models
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models
Derong Xu
Ziheng Zhang
Zhihong Zhu
Zhenxi Lin
Qidong Liu
...
Wanyu Wang
Yuyang Ye
Xiangyu Zhao
Yefeng Zheng
Enhong Chen
KELM
32
9
0
28 Feb 2024
Factual consistency evaluation of summarization in the Era of large language models
Factual consistency evaluation of summarization in the Era of large language models
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
HILM
35
1
0
21 Feb 2024
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Jiaheng Wei
Yuanshun Yao
Jean-François Ton
Hongyi Guo
Andrew Estornell
Yang Liu
HILM
55
18
0
16 Feb 2024
Inadequacies of Large Language Model Benchmarks in the Era of Generative
  Artificial Intelligence
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
ALM
ELM
64
51
0
15 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
371
0
09 Feb 2024
Factuality of Large Language Models in the Year 2024
Factuality of Large Language Models in the Year 2024
Yuxia Wang
Minghan Wang
Muhammad Arslan Manzoor
Fei Liu
Georgi Georgiev
Rocktim Jyoti Das
Preslav Nakov
LRM
HILM
38
22
0
04 Feb 2024
NetLLM: Adapting Large Language Models for Networking
NetLLM: Adapting Large Language Models for Networking
Duo Wu
Xianda Wang
Yaqi Qiao
Zhi Wang
Junchen Jiang
Shuguang Cui
Fangxin Wang
37
30
0
04 Feb 2024
A Survey on Large Language Model Hallucination via a Creativity
  Perspective
A Survey on Large Language Model Hallucination via a Creativity Perspective
Xuhui Jiang
Yuxing Tian
Fengrui Hua
Chengjin Xu
Yuanzhuo Wang
Jian Guo
LRM
21
22
0
02 Feb 2024
Towards Trustable Language Models: Investigating Information Quality of
  Large Language Models
Towards Trustable Language Models: Investigating Information Quality of Large Language Models
Rick Rejeleene
Xiaowei Xu
John R. Talburt
HILM
34
2
0
23 Jan 2024
Benchmarking LLMs via Uncertainty Quantification
Benchmarking LLMs via Uncertainty Quantification
Fanghua Ye
Mingming Yang
Jianhui Pang
Longyue Wang
Derek F. Wong
Emine Yilmaz
Shuming Shi
Zhaopeng Tu
ELM
20
47
0
23 Jan 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
71
218
0
22 Jan 2024
Hallucination Detection and Hallucination Mitigation: An Investigation
Hallucination Detection and Hallucination Mitigation: An Investigation
Junliang Luo
Tianyu Li
Di Wu
Michael R. M. Jenkin
Steve Liu
Gregory Dudek
HILM
LLMAG
44
22
0
16 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
60
56
0
11 Jan 2024
AI Hallucinations: A Misnomer Worth Clarifying
AI Hallucinations: A Misnomer Worth Clarifying
Negar Maleki
Balaji Padmanabhan
Kaushik Dutta
28
34
0
09 Jan 2024
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and
  Improvement of Large Language Models
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Wendi Cui
Jiaxin Zhang
Zhuohang Li
Lopez Damien
Kamalika Das
Bradley Malin
Kumar Sricharan
22
2
0
04 Jan 2024
Large Legal Fictions: Profiling Legal Hallucinations in Large Language
  Models
Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models
Matthew Dahl
Varun Magesh
Mirac Suzgun
Daniel E. Ho
HILM
AILaw
25
73
0
02 Jan 2024
Supervised Knowledge Makes Large Language Models Better In-context
  Learners
Supervised Knowledge Makes Large Language Models Better In-context Learners
Linyi Yang
Shuibai Zhang
Zhuohao Yu
Guangsheng Bao
Yidong Wang
...
Ruochen Xu
Weirong Ye
Xing Xie
Weizhu Chen
Yue Zhang
21
14
0
26 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
  Generative Artificial Intelligence (AI) Research Landscape
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
94
46
0
18 Dec 2023
Context Matters: Data-Efficient Augmentation of Large Language Models
  for Scientific Applications
Context Matters: Data-Efficient Augmentation of Large Language Models for Scientific Applications
Xiang Li
Haoran Tang
Siyu Chen
Ziwei Wang
Anurag Maravi
Marcin Abram
21
0
0
12 Dec 2023
HALO: An Ontology for Representing and Categorizing Hallucinations in
  Large Language Models
HALO: An Ontology for Representing and Categorizing Hallucinations in Large Language Models
Navapat Nananukul
Mayank Kejriwal
HILM
26
3
0
08 Dec 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
  Catching up?
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Chenyu You
ELM
CLL
AI4MH
LRM
ALM
85
27
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
64
21
0
28 Nov 2023
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models
  via Unconstrained Generation
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
Xun Liang
Shichao Song
Simin Niu
Zhiyu Li
Zhiyu Li
...
Zhaohui Wy
Dawei He
Peng Cheng
Zhonghao Wang
Haiying Deng
HILM
34
19
0
26 Nov 2023
Online Advertisements with LLMs: Opportunities and Challenges
Online Advertisements with LLMs: Opportunities and Challenges
S. Feizi
Mohammadtaghi Hajiaghayi
Keivan Rezaei
Suho Shin
OffRL
26
10
0
11 Nov 2023
A Survey of Large Language Models in Medicine: Progress, Application,
  and Challenge
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Hongjian Zhou
Fenglin Liu
Boyang Gu
Xinyu Zou
Jinfa Huang
...
Yefeng Zheng
Lei A. Clifton
Zheng Li
Fenglin Liu
David A. Clifton
LM&MA
33
107
0
09 Nov 2023
Evaluating General-Purpose AI with Psychometrics
Evaluating General-Purpose AI with Psychometrics
Xiting Wang
Liming Jiang
Jose Hernandez-Orallo
David Stillwell
Luning Sun
Fang Luo
Xing Xie
AI4MH
ELM
32
12
0
25 Oct 2023
Chainpoll: A high efficacy method for LLM hallucination detection
Chainpoll: A high efficacy method for LLM hallucination detection
Robert Friel
Atindriyo Sanyal
LRM
HILM
34
26
0
22 Oct 2023
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection
Xiang Chen
Duanzheng Song
Honghao Gui
Chengxi Wang
Ningyu Zhang
Jiang Yong
Yan Zhang
Chengfei Lv
Dan Zhang
Huajun Chen
HILM
35
14
0
18 Oct 2023
NuclearQA: A Human-Made Benchmark for Language Models for the Nuclear
  Domain
NuclearQA: A Human-Made Benchmark for Language Models for the Nuclear Domain
Anurag Acharya
Sai Munikoti
Aaron Hellinger
Sara Smith
S. Wagle
Sameera Horawalavithana
ELM
27
4
0
17 Oct 2023
Large Language Model Unlearning
Large Language Model Unlearning
Yuanshun Yao
Xiaojun Xu
Yang Liu
MU
41
111
0
14 Oct 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and
  Domain-Specificity
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Cunxiang Wang
Xiaoze Liu
Yuanhao Yue
Xiangru Tang
Tianhang Zhang
...
Linyi Yang
Jindong Wang
Xing Xie
Zheng-Wei Zhang
Yue Zhang
HILM
KELM
51
184
0
11 Oct 2023
CoQuest: Exploring Research Question Co-Creation with an LLM-based Agent
CoQuest: Exploring Research Question Co-Creation with an LLM-based Agent
Yiren Liu
Si Chen
Haocong Cheng
Mengxia Yu
Xiao Ran
Andrew Mo
Yiliu Tang
Yun Huang
LLMAG
41
46
0
09 Oct 2023
Evaluating Hallucinations in Chinese Large Language Models
Evaluating Hallucinations in Chinese Large Language Models
Qinyuan Cheng
Tianxiang Sun
Wenwei Zhang
Siyin Wang
Xiangyang Liu
...
Junliang He
Mianqiu Huang
Zhangyue Yin
Kai Chen
Xipeng Qiu
HILM
ELM
33
25
0
05 Oct 2023
FELM: Benchmarking Factuality Evaluation of Large Language Models
FELM: Benchmarking Factuality Evaluation of Large Language Models
Shiqi Chen
Yiran Zhao
Jinghan Zhang
Ethan Chern
Siyang Gao
Pengfei Liu
Junxian He
HILM
38
33
0
01 Oct 2023
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large
  Language Model Capabilities
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities
Hejia Geng
Boxun Xu
Peng Li
ELM
LRM
ReLM
41
1
0
30 Sep 2023
AutoHall: Automated Hallucination Dataset Generation for Large Language
  Models
AutoHall: Automated Hallucination Dataset Generation for Large Language Models
Zouying Cao
Yifei Yang
Hai Zhao
HILM
18
8
0
30 Sep 2023
Can LLM-Generated Misinformation Be Detected?
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
39
158
0
25 Sep 2023
Can Large Language Models Understand Real-World Complex Instructions?
Can Large Language Models Understand Real-World Complex Instructions?
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALM
LRM
ELM
98
52
0
17 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
48
77
0
13 Sep 2023
Zero-Resource Hallucination Prevention for Large Language Models
Zero-Resource Hallucination Prevention for Large Language Models
Junyu Luo
Cao Xiao
Fenglong Ma
HILM
29
16
0
06 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large
  Language Models
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
A. Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
46
522
0
03 Sep 2023
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large
  Language Models
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models
Baolin Zhang
Hai-Yong Xie
Pengfan Du
Junhao Chen
Pengfei Cao
Yubo Chen
Shengping Liu
Kang Liu
Jun Zhao
ELM
ALM
24
1
0
28 Aug 2023
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Ziyi Tang
Ruilin Wang
Weixing Chen
Keze Wang
Yang Liu
Tianshui Chen
Liang Lin
Tianshui Chen
Liang Lin
LRM
24
0
0
23 Aug 2023
Previous
1234
Next