ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11747
  4. Cited By
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
  Language Models

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

19 May 2023
Junyi Li
Xiaoxue Cheng
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
    HILM
    VLM
ArXivPDFHTML

Papers citing "HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models"

50 / 161 papers shown
Title
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
26
21
0
31 Jul 2024
Cost-Effective Hallucination Detection for LLMs
Cost-Effective Hallucination Detection for LLMs
Simon Valentin
Jinmiao Fu
Gianluca Detommaso
Shaoyuan Xu
Giovanni Zappella
Bryan Wang
HILM
42
4
0
31 Jul 2024
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective
Huu Tan Mai
Cuong Xuan Chu
Heiko Paulheim
33
6
0
29 Jul 2024
WildHallucinations: Evaluating Long-form Factuality in LLMs with
  Real-World Entity Queries
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Wenting Zhao
Tanya Goyal
Yu Ying Chiu
Liwei Jiang
Benjamin Newman
...
Khyathi Raghavi Chandu
Ronan Le Bras
Claire Cardie
Yuntian Deng
Yejin Choi
HILM
43
7
0
24 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
45
9
0
22 Jul 2024
The Future of Learning: Large Language Models through the Lens of
  Students
The Future of Learning: Large Language Models through the Lens of Students
He Zhang
Jingyi Xie
Chuhao Wu
Jie Cai
ChanMin Kim
John M. Carroll
AI4Ed
44
4
0
17 Jul 2024
Uncertainty is Fragile: Manipulating Uncertainty in Large Language
  Models
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Qingcheng Zeng
Mingyu Jin
Qinkai Yu
Zhenting Wang
Wenyue Hua
...
Felix Juefei Xu
Kaize Ding
Fan Yang
Ruixiang Tang
Yongfeng Zhang
AAML
44
10
0
15 Jul 2024
Look Within, Why LLMs Hallucinate: A Causal Perspective
Look Within, Why LLMs Hallucinate: A Causal Perspective
He Li
Haoang Chi
Mingyu Liu
Wenjing Yang
LRM
37
5
0
14 Jul 2024
Causality extraction from medical text using Large Language Models
  (LLMs)
Causality extraction from medical text using Large Language Models (LLMs)
Seethalakshmi Gopalakrishnan
Luciana D. Garbayo
Wlodek Zadrozny
ELM
24
6
0
13 Jul 2024
Converging Paradigms: The Synergy of Symbolic and Connectionist AI in
  LLM-Empowered Autonomous Agents
Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents
Haoyi Xiong
Zhiyuan Wang
Xuhong Li
Jiang Bian
Zeke Xie
Shahid Mumtaz
Laura E. Barnes
LLMAG
36
7
0
11 Jul 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language
  Models
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Yuzhe Gu
Ziwei Ji
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
HILM
39
5
0
05 Jul 2024
NutriBench: A Dataset for Evaluating Large Language Models on Nutrition Estimation from Meal Descriptions
NutriBench: A Dataset for Evaluating Large Language Models on Nutrition Estimation from Meal Descriptions
Andong Hua
Mehak Preet Dhaliwal
Ryan Burke
Yao Qin
Yao Qin
41
1
0
04 Jul 2024
Generative Monoculture in Large Language Models
Generative Monoculture in Large Language Models
Fan Wu
Emily Black
Varun Chandrasekaran
SyDa
35
3
0
02 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and
  Aleatoric Awareness
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
50
2
0
02 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Zhiyu Li
Linpeng Tang
Weinan E
50
12
0
01 Jul 2024
Building Understandable Messaging for Policy and Evidence Review
  (BUMPER) with AI
Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI
Katherine A. Rosenfeld
Maike Sonnewald
Sonia J. Jindal
Kevin A. McCarthy
Joshua L. Proctor
32
0
0
27 Jun 2024
Dye4AI: Assuring Data Boundary on Generative AI Services
Dye4AI: Assuring Data Boundary on Generative AI Services
Shu Wang
Kun Sun
Yan Zhai
42
1
0
20 Jun 2024
Thread: A Logic-Based Data Organization Paradigm for How-To Question
  Answering with Retrieval Augmented Generation
Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation
Kaikai An
Fangkai Yang
Liqun Li
Junting Lu
Sitao Cheng
...
Lele Cao
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
RALM
46
1
0
19 Jun 2024
TorchOpera: A Compound AI System for LLM Safety
TorchOpera: A Compound AI System for LLM Safety
Shanshan Han
Yuhang Yao
Zijian Hu
Dimitris Stripelis
Zhaozhuo Xu
Chaoyang He
LLMAG
44
0
0
16 Jun 2024
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation
A. B. M. A. Rahman
Saeed Anwar
Muhammad Usman
Ajmal Mian
HILM
44
2
0
13 Jun 2024
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level
  Hallucination Evaluation
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Wen Luo
Tianshu Shen
Wei Li
Guangyue Peng
Richeng Xuan
Houfeng Wang
Xi Yang
HILM
33
11
0
11 Jun 2024
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
Bairu Hou
Yang Zhang
Jacob Andreas
Shiyu Chang
77
5
0
11 Jun 2024
Deconstructing The Ethics of Large Language Models from Long-standing
  Issues to New-emerging Dilemmas
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Chengyuan Deng
Yiqun Duan
Xin Jin
Heng Chang
Yijun Tian
...
Kuofeng Gao
Sihong He
Jun Zhuang
Lu Cheng
Haohan Wang
AILaw
43
16
0
08 Jun 2024
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language
  Models with a Multi-task Evaluation Dataset
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
Weiqi Wang
Yangqiu Song
LRM
35
8
0
04 Jun 2024
Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs
Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs
Fatemeh Shiri
Van Nguyen
Farhad Moghimifar
John Yoo
Gholamreza Haffari
Yuan-Fang Li
ReLM
85
3
0
03 Jun 2024
BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of
  Large Language Models
BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
Jiaqi Xue
Meng Zheng
Yebowen Hu
Fei Liu
Xun Chen
Qian Lou
AAML
SILM
38
25
0
03 Jun 2024
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective
  Rationales
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Tianyang Xu
Shujin Wu
Shizhe Diao
Xiaoze Liu
Xingyao Wang
Yangyi Chen
Jing Gao
LRM
29
27
0
31 May 2024
Detecting Hallucinations in Large Language Model Generation: A Token
  Probability Approach
Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach
Ernesto Quevedo
Jorge Yero
Rachel Koerner
Pablo Rivas
Tomas Cerny
HILM
41
12
0
30 May 2024
WirelessLLM: Empowering Large Language Models Towards Wireless
  Intelligence
WirelessLLM: Empowering Large Language Models Towards Wireless Intelligence
Jiawei Shao
Jingwen Tong
Qiong Wu
Wei Guo
Zijian Li
Zehong Lin
Jun Zhang
34
26
0
27 May 2024
Embedding-Aligned Language Models
Embedding-Aligned Language Models
Guy Tennenholtz
Yinlam Chow
Chih-Wei Hsu
Lior Shani
Ethan Liang
Craig Boutilier
AIFin
37
1
0
24 May 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based
  Evaluation
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
40
14
0
23 May 2024
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model
  Against LLM Red-Teaming
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming
Jiaxu Liu
Xiangyu Yin
Sihao Wu
Jianhong Wang
Meng Fang
Xinping Yi
Xiaowei Huang
34
4
0
21 May 2024
Spectral Editing of Activations for Large Language Model Alignment
Spectral Editing of Activations for Large Language Model Alignment
Yifu Qiu
Zheng Zhao
Yftah Ziser
Anna Korhonen
E. Ponti
Shay B. Cohen
KELM
LLMSV
28
16
0
15 May 2024
Cross-Care: Assessing the Healthcare Implications of Pre-training Data
  on Language Model Bias
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias
Shan Chen
Jack Gallifant
Mingye Gao
Pedro Moreira
Nikolaj Munch
...
Hugo J. W. L. Aerts
Brian Anthony
Leo Anthony Celi
William G. La Cava
Danielle S. Bitterman
37
8
0
09 May 2024
Overcoming LLM Challenges using RAG-Driven Precision in Coffee Leaf
  Disease Remediation
Overcoming LLM Challenges using RAG-Driven Precision in Coffee Leaf Disease Remediation
Dr. Selva Kumar
Mohammed Ajmal Khan
Imadh Ajaz Banday
Manikantha Gada
Vibha Venkatesh
32
3
0
02 May 2024
GPT-4 passes most of the 297 written Polish Board Certification
  Examinations
GPT-4 passes most of the 297 written Polish Board Certification Examinations
Jakub Pokrywka
Jeremi Kaczmarek
Edward Gorzelañczyk
LM&MA
ELM
43
5
0
29 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
84
46
0
23 Apr 2024
LLMChain: Blockchain-based Reputation System for Sharing and Evaluating
  Large Language Models
LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models
Mouhamed Amine Bouchiha
Quentin Telnoff
Souhail Bakkali
R. Champagnat
Mourad Rabah
Mickael Coustaty
Y. Ghamri-Doudane
LRM
42
3
0
20 Apr 2024
Exploring the landscape of large language models: Foundations,
  techniques, and challenges
Exploring the landscape of large language models: Foundations, techniques, and challenges
M. Moradi
Ke Yan
David Colwell
Matthias Samwald
Rhona Asgari
OffRL
46
1
0
18 Apr 2024
On the Limitations of Large Language Models (LLMs): False Attribution
On the Limitations of Large Language Models (LLMs): False Attribution
Tosin P. Adewumi
Nudrat Habib
Lama Alkhaled
Elisa Barney
HILM
37
7
0
06 Apr 2024
Fakes of Varying Shades: How Warning Affects Human Perception and
  Engagement Regarding LLM Hallucinations
Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
Mahjabin Nahar
Haeseung Seo
Eun-Ju Lee
Aiping Xiong
Dongwon Lee
HILM
37
11
0
04 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
44
20
0
04 Apr 2024
PRobELM: Plausibility Ranking Evaluation for Language Models
PRobELM: Plausibility Ranking Evaluation for Language Models
Moy Yuan
Chenxi Whitehouse
Eric Chamoun
Rami Aly
Andreas Vlachos
91
4
0
04 Apr 2024
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual
  Checking
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Jiawei Zhang
Chejian Xu
Y. Gai
Freddy Lecue
Dawn Song
Bo-wen Li
HILM
29
10
0
03 Apr 2024
Digital Forgetting in Large Language Models: A Survey of Unlearning
  Methods
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
Alberto Blanco-Justicia
N. Jebreel
Benet Manzanares-Salor
David Sánchez
Josep Domingo-Ferrer
Guillem Collell
Kuan Eeik Tan
KELM
MU
48
17
0
02 Apr 2024
Large Language Models are Capable of Offering Cognitive Reappraisal, if
  Guided
Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided
Hongli Zhan
Allen Zheng
Yoon Kyung Lee
Jina Suh
Junyi Jessy Li
Desmond C. Ong
AI4MH
47
8
0
01 Apr 2024
Evaluating the Factuality of Large Language Models using Large-Scale
  Knowledge Graphs
Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs
Xiaoze Liu
Feijie Wu
Tianyang Xu
Zhuo Chen
Yichi Zhang
Xiaoqian Wang
Jing Gao
HILM
45
8
0
01 Apr 2024
Token-Efficient Leverage Learning in Large Language Models
Token-Efficient Leverage Learning in Large Language Models
Yuanhao Zeng
Min Wang
Yihang Wang
Yingxia Shao
37
0
0
01 Apr 2024
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge
  Editing Benchmark
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge Editing Benchmark
Baolong Bi
Shenghua Liu
Yiwei Wang
Lingrui Mei
Xueqi Cheng
KELM
41
9
0
30 Mar 2024
Non-Linear Inference Time Intervention: Improving LLM Truthfulness
Non-Linear Inference Time Intervention: Improving LLM Truthfulness
Jakub Hoscilowicz
Adam Wiacek
Jan Chojnacki
Adam Cieślak
Leszek Michon
Vitalii Urbanevych
Artur Janicki
KELM
30
2
0
27 Mar 2024
Previous
1234
Next