ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08704
  4. Cited By
A Token-level Reference-free Hallucination Detection Benchmark for
  Free-form Text Generation

A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

18 April 2021
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
    HILM
ArXivPDFHTML

Papers citing "A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation"

50 / 108 papers shown
Title
Conflicts in Texts: Data, Implications and Challenges
Conflicts in Texts: Data, Implications and Challenges
Siyi Liu
Dan Roth
145
0
0
28 Apr 2025
Span-Level Hallucination Detection for LLM-Generated Answers
Span-Level Hallucination Detection for LLM-Generated Answers
Passant Elchafei
Mervet Abu-Elkheir
HILM
LRM
66
1
0
25 Apr 2025
HalluLens: LLM Hallucination Benchmark
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
92
0
0
24 Apr 2025
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Liyi Zhang
Veniamin Veselovsky
R. Thomas McCoy
Thomas L. Griffiths
56
0
0
17 Apr 2025
SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes
SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes
Raúl Vázquez
Timothee Mickus
Elaine Zosa
Teemu Vahtola
Jörg Tiedemann
...
Liane Guillou
Ona de Gibert
Jaione Bengoetxea
Joseph Attieh
Marianna Apidianaki
HILM
VLM
LRM
87
0
0
16 Apr 2025
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
Xu Zhang
Zhifei Liu
Jiahao Wang
Huixuan Zhang
Fan Xu
Junzhe Zhang
Xiaojun Wan
HILM
34
0
0
14 Apr 2025
HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs
HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs
Sharanya Dasgupta
Sujoy Nath
Arkaprabha Basu
Pourya Shamsolmoali
Swagatam Das
HILM
65
0
0
13 Apr 2025
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis
Guy Bar-Shalom
Fabrizio Frasca
Derek Lim
Yoav Gelberg
Yftah Ziser
Ran El-Yaniv
Gal Chechik
Haggai Maron
67
0
0
18 Mar 2025
HalluCounter: Reference-free LLM Hallucination Detection in the Wild!
Ashok Urlana
Gopichand Kanumolu
Charaka Vinayak Kumar
B. Garlapati
Rahul Mishra
HILM
61
0
0
06 Mar 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
191
0
0
26 Feb 2025
How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild
How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild
Saad Obaid ul Islam
Anne Lauscher
Goran Glavas
HILM
LRM
122
1
0
21 Feb 2025
Can Your Uncertainty Scores Detect Hallucinated Entity?
Can Your Uncertainty Scores Detect Hallucinated Entity?
Min-Hsuan Yeh
Max Kamachee
Seongheon Park
Yixuan Li
HILM
55
1
0
17 Feb 2025
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Shanshan Han
Salman Avestimehr
Chaoyang He
76
0
0
12 Feb 2025
SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models
SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models
Diyana Muhammed
Gollam Rabby
Sören Auer
LLMAG
HILM
81
0
0
03 Feb 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Y. Liu
51
0
0
06 Jan 2025
Investigating Factuality in Long-Form Text Generation: The Roles of
  Self-Known and Self-Unknown
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Lifu Tu
Rui Meng
Shafiq R. Joty
Yingbo Zhou
Semih Yavuz
HILM
67
0
0
24 Nov 2024
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
Vipula Rawte
Sarthak Jain
Aarush Sinha
Garv Kaushik
Aman Bansal
...
Aishwarya N. Reganti
Vinija Jain
Aman Chadha
A. Sheth
A. Das
VLM
MLLM
52
1
0
16 Nov 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
77
2
0
01 Nov 2024
Multilingual Hallucination Gaps in Large Language Models
Multilingual Hallucination Gaps in Large Language Models
Cléa Chataigner
Afaf Taik
G. Farnadi
HILM
LRM
34
3
0
23 Oct 2024
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in
  Code
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code
Nan Jiang
Qi Li
Lin Tan
Tianyi Zhang
HILM
29
1
0
13 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILM
AIFin
61
25
0
03 Oct 2024
Multimodal Coherent Explanation Generation of Robot Failures
Multimodal Coherent Explanation Generation of Robot Failures
Pradip Pramanick
Silvia Rossi
21
2
0
01 Oct 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming
Senthil Purushwalkam
Shrey Pandit
Zixuan Ke
Xuan-Phi Nguyen
Caiming Xiong
Shafiq R. Joty
HILM
112
16
0
30 Sep 2024
Improving Faithfulness of Large Language Models in Summarization via
  Sliding Generation and Self-Consistency
Improving Faithfulness of Large Language Models in Summarization via Sliding Generation and Self-Consistency
Taiji Li
Zhi Li
Yin Zhang
HILM
30
5
0
31 Jul 2024
Cost-Effective Hallucination Detection for LLMs
Cost-Effective Hallucination Detection for LLMs
Simon Valentin
Jinmiao Fu
Gianluca Detommaso
Shaoyuan Xu
Giovanni Zappella
Bryan Wang
HILM
42
4
0
31 Jul 2024
Stress-Testing Long-Context Language Models with Lifelong ICL and Task
  Haystack
Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack
Xiaoyue Xu
Qinyuan Ye
Xiang Ren
45
6
0
23 Jul 2024
ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free
  Hallucination Detection
ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection
Janek Herrlein
Chia-Chien Hung
Goran Glavavs
HILM
33
1
0
18 Jul 2024
Mitigating Entity-Level Hallucination in Large Language Models
Mitigating Entity-Level Hallucination in Large Language Models
Weihang Su
Yichen Tang
Qingyao Ai
Changyue Wang
Zhijing Wu
Yiqun Liu
HILM
39
6
0
12 Jul 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
Maor Ivgi
Ori Yoran
Jonathan Berant
Mor Geva
HILM
58
8
0
08 Jul 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language
  Models
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Yuzhe Gu
Ziwei Ji
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
HILM
36
5
0
05 Jul 2024
Hallucination Detection: Robustly Discerning Reliable Answers in Large
  Language Models
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
Yuyan Chen
Qiang Fu
Yichen Yuan
Zhihao Wen
Ge Fan
Dayiheng Liu
Dongmei Zhang
Zhixu Li
Yanghua Xiao
HILM
44
69
0
04 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and
  Aleatoric Awareness
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
47
2
0
02 Jul 2024
PFME: A Modular Approach for Fine-grained Hallucination Detection and
  Editing of Large Language Models
PFME: A Modular Approach for Fine-grained Hallucination Detection and Editing of Large Language Models
Kunquan Deng
Zeyu Huang
Chen Li
Chenghua Lin
Min Gao
Wenge Rong
KELM
34
0
0
29 Jun 2024
Building Understandable Messaging for Policy and Evidence Review
  (BUMPER) with AI
Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI
Katherine A. Rosenfeld
Maike Sonnewald
Sonia J. Jindal
Kevin A. McCarthy
Joshua L. Proctor
32
0
0
27 Jun 2024
Estimating Knowledge in Large Language Models Without Generating a
  Single Token
Estimating Knowledge in Large Language Models Without Generating a Single Token
Daniela Gottesman
Mor Geva
43
10
0
18 Jun 2024
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction
  Tuning
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning
Jiaqi Li
Yixuan Tang
Yi Yang
46
5
0
14 Jun 2024
REAL Sampling: Boosting Factuality and Diversity of Open-Ended
  Generation via Asymptotic Entropy
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy
Haw-Shiuan Chang
Nanyun Peng
Mohit Bansal
Anil Ramakrishna
Tagyoung Chung
HILM
42
2
0
11 Jun 2024
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
Bairu Hou
Yang Zhang
Jacob Andreas
Shiyu Chang
74
5
0
11 Jun 2024
Investigating and Addressing Hallucinations of LLMs in Tasks Involving
  Negation
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
Neeraj Varshney
Satyam Raj
Venkatesh Mishra
Agneet Chatterjee
Ritika Sarkar
Amir Saeidi
Chitta Baral
LRM
35
7
0
08 Jun 2024
ANAH: Analytical Annotation of Hallucinations in Large Language Models
ANAH: Analytical Annotation of Hallucinations in Large Language Models
Ziwei Ji
Yuzhe Gu
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai-xiang Chen
HILM
56
2
0
30 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
62
5
0
29 May 2024
Large Language Models Meet NLP: A Survey
Large Language Models Meet NLP: A Survey
Libo Qin
Qiguang Chen
Xiachong Feng
Yang Wu
Yongheng Zhang
Yinghui Li
Min Li
Wanxiang Che
Philip S. Yu
ALM
LM&MA
ELM
LRM
52
47
0
21 May 2024
RDRec: Rationale Distillation for LLM-based Recommendation
RDRec: Rationale Distillation for LLM-based Recommendation
Xinfeng Wang
Jin Cui
Yoshimi Suzuki
Fumiyo Fukumoto
LRM
31
11
0
17 May 2024
Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language
  Models
Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models
Houjun Liu
LM&Ro
LRM
21
0
0
29 Apr 2024
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of
  Theories, Detection Methods, and Opportunities
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Xiaomin Yu
Yezhaohui Wang
Yanfang Chen
Zhen Tao
Dinghao Xi
Shichao Song
Simin Niu
Zhiyu Li
67
8
0
25 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
50
6
0
12 Apr 2024
Know When To Stop: A Study of Semantic Drift in Text Generation
Know When To Stop: A Study of Semantic Drift in Text Generation
Ava Spataru
Eric Hambro
Elena Voita
Nicola Cancedda
29
3
0
08 Apr 2024
SLPL SHROOM at SemEval2024 Task 06: A comprehensive study on models
  ability to detect hallucination
SLPL SHROOM at SemEval2024 Task 06: A comprehensive study on models ability to detect hallucination
Pouya Fallah
S. Gooran
Mohammad Jafarinasab
Pouya Sadeghi
Reza Farnia
Amirreza Tarabkhah
Zainab Sadat Taghavi
Hossein Sameti
HILM
51
3
0
07 Apr 2024
Multicalibration for Confidence Scoring in LLMs
Multicalibration for Confidence Scoring in LLMs
Gianluca Detommaso
Martín Bertrán
Riccardo Fogliato
Aaron Roth
24
12
0
06 Apr 2024
Fakes of Varying Shades: How Warning Affects Human Perception and
  Engagement Regarding LLM Hallucinations
Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
Mahjabin Nahar
Haeseung Seo
Eun-Ju Lee
Aiping Xiong
Dongwon Lee
HILM
29
11
0
04 Apr 2024
123
Next