ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15844
  4. Cited By
Hallucination Detection in Large Language Models with Metamorphic Relations

Hallucination Detection in Large Language Models with Metamorphic Relations

20 February 2025
Borui Yang
Md Afif Al Mamun
Jie M. Zhang
Gias Uddin
    HILM
ArXivPDFHTML

Papers citing "Hallucination Detection in Large Language Models with Metamorphic Relations"

36 / 36 papers shown
Title
Mutation-Guided LLM-based Test Generation at Meta
Mutation-Guided LLM-based Test Generation at Meta
Christopher Foster
Abhishek Gulati
Mark Harman
Inna Harper
Ke Mao
Jillian Ritchey
Hervé Robert
Shubho Sengupta
91
4
0
22 Jan 2025
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White
Samuel Dooley
Manley Roberts
Arka Pal
Ben Feuer
...
Willie Neiswanger
Micah Goldblum
Tom Goldstein
Willie Neiswanger
Micah Goldblum
ELM
90
18
0
27 Jun 2024
Investigating and Addressing Hallucinations of LLMs in Tasks Involving
  Negation
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
Neeraj Varshney
Satyam Raj
Venkatesh Mishra
Agneet Chatterjee
Ritika Sarkar
Amir Saeidi
Chitta Baral
LRM
86
10
0
08 Jun 2024
Large Language Models Meet NLP: A Survey
Large Language Models Meet NLP: A Survey
Libo Qin
Qiguang Chen
Xiachong Feng
Yang Wu
Yongheng Zhang
Hai-Tao Zheng
Min Li
Wanxiang Che
Philip S. Yu
ALM
LM&MA
ELM
LRM
96
54
0
21 May 2024
Strong hallucinations from negation and how to fix them
Strong hallucinations from negation and how to fix them
Nicholas Asher
Swarnadeep Bhar
ReLM
LRM
45
5
0
16 Feb 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
137
246
0
22 Jan 2024
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model
  Qualities
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities
Sangwon Hyun
Mingyu Guo
Muhammad Ali Babar
58
9
0
11 Dec 2023
FreshLLMs: Refreshing Large Language Models with Search Engine
  Augmentation
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Tu Vu
Mohit Iyyer
Xuezhi Wang
Noah Constant
Jerry W. Wei
...
Chris Tar
Yun-hsuan Sung
Denny Zhou
Quoc Le
Thang Luong
KELM
HILM
LRM
95
216
0
05 Oct 2023
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial
  Examples
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
Jia-Yu Yao
Kun-Peng Ning
Zhen-Hui Liu
Munan Ning
Li Yuan
HILM
LRM
AAML
63
188
0
02 Oct 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large
  Language Models
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
96
558
0
03 Sep 2023
FacTool: Factuality Detection in Generative AI -- A Tool Augmented
  Framework for Multi-Task and Multi-Domain Scenarios
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
Ethan Chern
Steffi Chern
Shiqi Chen
Weizhe Yuan
Kehua Feng
Chunting Zhou
Junxian He
Graham Neubig
Pengfei Liu
HILM
52
203
0
25 Jul 2023
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
  LLMs by Validating Low-Confidence Generation
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Neeraj Varshney
Wenlin Yao
Hongming Zhang
Jianshu Chen
Dong Yu
HILM
96
170
0
08 Jul 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of
  Confidence Elicitation in LLMs
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
195
433
0
22 Jun 2023
Do Language Models Know When They're Hallucinating References?
Do Language Models Know When They're Hallucinating References?
A. Agrawal
Mirac Suzgun
Lester W. Mackey
Adam Tauman Kalai
HILM
LRM
71
98
0
29 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
126
684
0
23 May 2023
LM vs LM: Detecting Factual Errors via Cross Examination
LM vs LM: Detecting Factual Errors via Cross Examination
Roi Cohen
May Hamri
Mor Geva
Amir Globerson
HILM
92
137
0
22 May 2023
Complex Claim Verification with Evidence Retrieved in the Wild
Complex Claim Verification with Evidence Retrieved in the Wild
Jifan Chen
Grace Kim
Aniruddh Sriram
Greg Durrett
Eunsol Choi
HILM
84
81
0
19 May 2023
ConceptEVA: Concept-Based Interactive Exploration and Customization of
  Document Summaries
ConceptEVA: Concept-Based Interactive Exploration and Customization of Document Summaries
Xiaoyu Zhang
J. Li
Po-Wei Chi
Senthil K. Chandrasegaran
Kwan-Liu Ma
40
22
0
31 Mar 2023
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
ELM
HILM
ALM
57
79
0
27 Mar 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark Gales
HILM
LRM
189
427
0
15 Mar 2023
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic,
  and Multimodal
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal
K. Liang
Lingyuan Meng
Meng Liu
Yue Liu
Wenxuan Tu
Siwei Wang
Sihang Zhou
Xinwang Liu
Fu Sun
LRM
74
120
0
12 Dec 2022
RealTime QA: What's the Answer Right Now?
RealTime QA: What's the Answer Right Now?
Jungo Kasai
Keisuke Sakaguchi
Yoichi Takahashi
Ronan Le Bras
Akari Asai
Xinyan Velocity Yu
Dragomir R. Radev
Noah A. Smith
Yejin Choi
Kentaro Inui
KELM
128
192
0
27 Jul 2022
Language Models (Mostly) Know What They Know
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
108
817
0
11 Jul 2022
An Exploration of Post-Editing Effectiveness in Text Summarization
An Exploration of Post-Editing Effectiveness in Text Summarization
Vivian Lai
Alison Smith-Renner
Ke Zhang
Ruijia Cheng
Wenjuan Zhang
Joel R. Tetreault
Alejandro Jaimes
40
1
0
13 Jun 2022
Faithfulness in Natural Language Generation: A Systematic Survey of
  Analysis, Evaluation and Optimization Methods
Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods
Wei Li
Wenhao Wu
Moye Chen
Jiachen Liu
Xinyan Xiao
Hua Wu
HILM
98
27
0
10 Mar 2022
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
140
1,897
0
08 Sep 2021
A Survey on Automated Fact-Checking
A Survey on Automated Fact-Checking
Zhijiang Guo
Michael Schlichtkrull
Andreas Vlachos
90
492
0
26 Aug 2021
A preliminary study on evaluating Consultation Notes with Post-Editing
A preliminary study on evaluating Consultation Notes with Post-Editing
Francesco Moramarco
Alex Papadopoulos Korfiatis
Aleksandar Savkov
Ehud Reiter
27
13
0
09 Apr 2021
Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs
Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs
Hongyu Ren
J. Leskovec
BDL
75
209
0
22 Oct 2020
BoxE: A Box Embedding Model for Knowledge Base Completion
BoxE: A Box Embedding Model for Knowledge Base Completion
Ralph Abboud
.Ismail .Ilkan Ceylan
Thomas Lukasiewicz
Tommaso Salvatori
73
180
0
13 Jul 2020
Generating Fact Checking Explanations
Generating Fact Checking Explanations
Pepa Atanasova
J. Simonsen
Christina Lioma
Isabelle Augenstein
57
199
0
13 Apr 2020
A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking
A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking
Andreas Hanselowski
Christian Stab
Claudia Schulz
Zile Li
Iryna Gurevych
63
124
0
29 Oct 2019
Evaluating the Factual Consistency of Abstractive Text Summarization
Evaluating the Factual Consistency of Abstractive Text Summarization
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
HILM
101
743
0
28 Oct 2019
MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact
  Checking of Claims
MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims
Isabelle Augenstein
Christina Lioma
Dongsheng Wang
Lucas Chaves Lima
Casper Hansen
Christian B. Hansen
J. Simonsen
HILM
117
251
0
07 Sep 2019
Machine Learning Testing: Survey, Landscapes and Horizons
Machine Learning Testing: Survey, Landscapes and Horizons
Jie M. Zhang
Mark Harman
Lei Ma
Yang Liu
VLM
AILaw
72
750
0
19 Jun 2019
Get To The Point: Summarization with Pointer-Generator Networks
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
293
4,019
0
14 Apr 2017
1