ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.03558
  4. Cited By
Benchmarking Hallucination in Large Language Models based on
  Unanswerable Math Word Problem

Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

6 March 2024
Yuhong Sun
Zhangyue Yin
Qipeng Guo
Jiawen Wu
Xipeng Qiu
Hui Zhao
ArXivPDFHTML

Papers citing "Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem"

9 / 9 papers shown
Title
The Hallucination Tax of Reinforcement Finetuning
The Hallucination Tax of Reinforcement Finetuning
Linxin Song
Taiwei Shi
Jieyu Zhao
HILM
LRM
12
0
0
20 May 2025
HalluLens: LLM Hallucination Benchmark
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
92
1
0
24 Apr 2025
A Debate-Driven Experiment on LLM Hallucinations and Accuracy
A Debate-Driven Experiment on LLM Hallucinations and Accuracy
Ray Li
Tanishka Bagade
Kevin Martinez
Flora Yasmin
Grant Ayala
Michael Lam
Kevin Zhu
HILM
37
0
0
25 Oct 2024
When Not to Answer: Evaluating Prompts on GPT Models for Effective
  Abstention in Unanswerable Math Word Problems
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems
Asir Saadat
Tasmia Binte Sogir
Md Taukir Azam Chowdhury
Syem Aziz
79
1
0
16 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILM
AIFin
61
29
0
03 Oct 2024
When Context Leads but Parametric Memory Follows in Large Language
  Models
When Context Leads but Parametric Memory Follows in Large Language Models
Yufei Tao
Adam Hiatt
Erik Haake
Antonie J. Jetter
Ameeta Agrawal
KELM
38
0
0
13 Sep 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical
  Reasoning with Checklist
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Derek F. Wong
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELM
LRM
71
25
0
11 Jul 2024
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks
Maciej Besta
Lorenzo Paleari
Aleš Kubíček
Piotr Nyczyk
Robert Gerstenberger
Patrick Iff
Tomasz Lehmann
H. Niewiadomski
Torsten Hoefler
75
5
0
04 Jun 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
369
12,081
0
04 Mar 2022
1