ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11747
  4. Cited By
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
  Language Models

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

19 May 2023
Junyi Li
Xiaoxue Cheng
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
    HILM
    VLM
ArXivPDFHTML

Papers citing "HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models"

50 / 161 papers shown
Title
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
HalluLens: LLM Hallucination Benchmark
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
92
1
0
24 Apr 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
80
0
0
21 Apr 2025
FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory
FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory
Alessio Buscemi
Daniele Proverbio
A. D. Stefano
Anh Han
German Castignani
Pietro Lió
27
2
0
19 Apr 2025
Sparks of Science: Hypothesis Generation Using Structured Paper Data
Sparks of Science: Hypothesis Generation Using Structured Paper Data
Charles OÑeill
Tirthankar Ghosal
Roberta Răileanu
Mike Walmsley
Thang Bui
Kevin Schawinski
I. Ciucă
LRM
56
0
0
17 Apr 2025
HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs
HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs
Sharanya Dasgupta
Sujoy Nath
Arkaprabha Basu
Pourya Shamsolmoali
Swagatam Das
HILM
65
0
0
13 Apr 2025
Universal Collection of Euclidean Invariants between Pairs of Position-Orientations
Universal Collection of Euclidean Invariants between Pairs of Position-Orientations
Gijs Bellaard
B. Smets
R. Duits
64
0
0
04 Apr 2025
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking
Zihan Gu
Ruoyu Chen
Hua Zhang
Yue Hu
Xiaochun Cao
39
0
0
04 Apr 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
85
0
0
29 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLM
VLM
LRM
85
5
0
25 Mar 2025
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
Zhangcheng Qiang
Kerry Taylor
Weiqing Wang
Jing Jiang
52
0
0
25 Mar 2025
Language Model Uncertainty Quantification with Attention Chain
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li
Rushi Qiang
Lama Moukheiber
Chao Zhang
LRM
46
1
0
24 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
53
1
0
18 Mar 2025
Can Your Uncertainty Scores Detect Hallucinated Entity?
Can Your Uncertainty Scores Detect Hallucinated Entity?
Min-Hsuan Yeh
Max Kamachee
Seongheon Park
Yixuan Li
HILM
55
1
0
17 Feb 2025
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
Sujeong Lee
Hayoung Lee
Seongsoo Heo
Wonik Choi
HILM
93
0
0
12 Feb 2025
Breaking Focus: Contextual Distraction Curse in Large Language Models
Breaking Focus: Contextual Distraction Curse in Large Language Models
Yue Huang
Yanbo Wang
Zixiang Xu
Chujie Gao
Siyuan Wu
Jiayi Ye
Xiuying Chen
Pin-Yu Chen
Xuzhi Zhang
AAML
48
1
0
03 Feb 2025
SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models
SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models
Diyana Muhammed
Gollam Rabby
Sören Auer
LLMAG
HILM
81
0
0
03 Feb 2025
Iterative Tree Analysis for Medical Critics
Iterative Tree Analysis for Medical Critics
Zenan Huang
Mingwei Li
Zheng Zhou
Youxin Jiang
133
0
0
18 Jan 2025
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms
Yilong Li
Jingyu Liu
Hao Zhang
M Badri Narayanan
Utkarsh Sharma
Shuai Zhang
Pan Hu
Yijing Zeng
Jayaram Raghuram
Suman Banerjee
MQ
39
2
0
10 Jan 2025
Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
Shahrad Mohammadzadeh
Juan D. Guerra
Marco Bonizzato
Reihaneh Rabbany
Golnoosh Farnadi
HILM
54
0
0
08 Jan 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Alon Jacovi
Andrew Wang
Chris Alberti
Connie Tao
Jon Lipovetz
...
Rachana Fellinger
Rui Wang
Zizhao Zhang
Sasha Goldshtein
Dipanjan Das
HILM
ALM
93
13
0
06 Jan 2025
Investigating Factuality in Long-Form Text Generation: The Roles of
  Self-Known and Self-Unknown
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Lifu Tu
Rui Meng
Chenyu You
Yingbo Zhou
Semih Yavuz
HILM
67
0
0
24 Nov 2024
LLM Hallucination Reasoning with Zero-shot Knowledge Test
LLM Hallucination Reasoning with Zero-shot Knowledge Test
Seongmin Lee
Hsiang Hsu
Chun-Fu Chen
LRM
39
2
0
14 Nov 2024
RAGulator: Lightweight Out-of-Context Detectors for Grounded Text
  Generation
RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation
Ian Poey
Jiajun Liu
Qishuai Zhong
Adrien Chenailler
63
0
0
06 Nov 2024
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language
  Models
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
J. Wu
Tsz Ting Chung
Kai Chen
Dit-Yan Yeung
VLM
LRM
65
3
0
30 Oct 2024
Improving Uncertainty Quantification in Large Language Models via
  Semantic Embeddings
Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings
Yashvir S. Grewal
Edwin V. Bonilla
Thang D. Bui
UQCV
43
4
0
30 Oct 2024
FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation
FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation
Farima Fatahi Bayat
Lechen Zhang
Sheza Munir
Lu Wang
HILM
52
3
0
29 Oct 2024
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of
  Large Language Models
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models
Mohammad Beigi
Sijia Wang
Ying Shen
Zihao Lin
Adithya Kulkarni
...
Ming Jin
Jin-Hee Cho
Dawei Zhou
Chang-Tien Lu
Lifu Huang
29
1
0
26 Oct 2024
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite
  Learning
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
Yujian Liu
Shiyu Chang
Tommi Jaakkola
Yang Zhang
28
0
0
25 Oct 2024
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Yujun Zhou
Jingdong Yang
Kehan Guo
Pin-Yu Chen
Tian Gao
...
Tian Gao
Werner Geyer
Nuno Moniz
Nitesh V Chawla
Xiangliang Zhang
43
5
0
18 Oct 2024
Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM Agents
Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM Agents
Jiachen Li
Justin Steinberg
Xiwen Li
Akshat Choube
Bingsheng Yao
Dakuo Wang
Elizabeth D. Mynatt
Elizabeth Mynatt
Varun Mishra
34
0
0
18 Oct 2024
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in
  Code
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code
Nan Jiang
Qi Li
Lin Tan
Tianyi Zhang
HILM
37
1
0
13 Oct 2024
PoisonBench: Assessing Large Language Model Vulnerability to Data
  Poisoning
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Tingchen Fu
Mrinank Sharma
Philip Torr
Shay B. Cohen
David M. Krueger
Fazl Barez
AAML
50
7
0
11 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
87
1
0
09 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
47
8
0
09 Oct 2024
FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in
  LMs
FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in LMs
Deema Alnuhait
Neeraja Kirtane
Muhammad Khalifa
Hao Peng
HILM
LRM
34
2
0
03 Oct 2024
Listening to the Wise Few: Select-and-Copy Attention Heads for
  Multiple-Choice QA
Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA
Eduard Tulchinskii
Laida Kushnareva
Kristian Kuznetsov
Anastasia Voznyuk
Andrei Andriiainen
Irina Piontkovskaya
Evgeny Burnaev
Serguei Barannikov
72
1
0
03 Oct 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming
Senthil Purushwalkam
Shrey Pandit
Zixuan Ke
Xuan-Phi Nguyen
Caiming Xiong
Chenyu You
HILM
114
16
0
30 Sep 2024
MedHalu: Hallucinations in Responses to Healthcare Queries by Large
  Language Models
MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models
Vibhor Agarwal
Yiqiao Jin
Mohit Chandra
Munmun De Choudhury
Srijan Kumar
Nishanth R. Sastry
HILM
LM&MA
47
5
0
29 Sep 2024
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination
  Detection
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
Xuefeng Du
Chaowei Xiao
Yixuan Li
HILM
34
18
0
26 Sep 2024
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with
  Davidson Scene Graphs
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs
Bowen Yan
Zhengsong Zhang
Liqiang Jing
Eftekhar Hossain
Xinya Du
71
1
0
20 Sep 2024
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent
Fatemeh Haji
Mazal Bethany
Maryam Tabar
Jason Chiang
Anthony Rios
Peyman Najafirad
LLMAG
LRM
AI4CE
40
4
0
17 Sep 2024
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation
  in Large Language Models
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models
Mengfei Liang
Archish Arun
Zekun Wu
Cristian Muñoz
Jonathan Lutch
Emre Kazim
Adriano Soares Koshiyama
Philip C. Treleaven
HILM
34
0
0
17 Sep 2024
Generating API Parameter Security Rules with LLM for API Misuse
  Detection
Generating API Parameter Security Rules with LLM for API Misuse Detection
Jinghua Liu
Yi Yang
Kai Chen
Miaoqian Lin
16
4
0
14 Sep 2024
Characterizing and Evaluating the Reliability of LLMs against Jailbreak
  Attacks
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Kexin Chen
Yi Liu
Donghai Hong
Jiaying Chen
Wenhai Wang
44
2
0
18 Aug 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
79
13
0
16 Aug 2024
Zero-shot Factual Consistency Evaluation Across Domains
Zero-shot Factual Consistency Evaluation Across Domains
Raunak Agarwal
HILM
47
0
0
07 Aug 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton
  Modules for Compositional Visual Reasoning
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yanjie Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
36
3
0
05 Aug 2024
Dissecting Dissonance: Benchmarking Large Multimodal Models Against
  Self-Contradictory Instructions
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Jin Gao
Lei Gan
Yuankai Li
Yixin Ye
Dequan Wang
29
3
0
02 Aug 2024
1234
Next