Benchmarking Hallucination in Large Language Models based on
Unanswerable Math Word Problem

Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

6 March 2024

Qipeng Guo

Xipeng Qiu

Papers citing "Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem"

9 / 9 papers shown

Title
The Hallucination Tax of Reinforcement Finetuning Linxin Song Taiwei Shi Jieyu Zhao HILM LRM 12 0 0 20 May 2025
HalluLens: LLM Hallucination Benchmark Yejin Bang Ziwei Ji Alan Schelten Anthony Hartshorn Tara Fowler Cheng Zhang Nicola Cancedda Pascale Fung HILM 92 1 0 24 Apr 2025
A Debate-Driven Experiment on LLM Hallucinations and Accuracy Ray Li Tanishka Bagade Kevin Martinez Flora Yasmin Grant Ayala Michael Lam Kevin Zhu HILM 37 0 0 25 Oct 2024
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems Asir Saadat Tasmia Binte Sogir Md Taukir Azam Chowdhury Syem Aziz 79 1 0 16 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Hadas Orgad Michael Toker Zorik Gekhman Roi Reichart Idan Szpektor Hadas Kotek Yonatan Belinkov HILM AIFin 61 29 0 03 Oct 2024
When Context Leads but Parametric Memory Follows in Large Language Models Yufei Tao Adam Hiatt Erik Haake Antonie J. Jetter Ameeta Agrawal KELM 38 0 0 13 Sep 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist Zihao Zhou Shudong Liu Maizhen Ning Wei Liu Jindong Wang Derek F. Wong Xiaowei Huang Qiufeng Wang Kaizhu Huang ELM LRM 71 25 0 11 Jul 2024
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks Maciej Besta Lorenzo Paleari Aleš Kubíček Piotr Nyczyk Robert Gerstenberger Patrick Iff Tomasz Lehmann H. Niewiadomski Torsten Hoefler 75 5 0 04 Jun 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 369 12,081 0 04 Mar 2022